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Preface 


This preface introduces the ARM926EJ-S Revision r0p5 Technical Reference Manual 
(TRM). It contains the following sections: 


o About this manual on page xvii 
o Feedback on page xx111. 
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About this manual 


This is the Technical Reference Manual for the ARM926EJ-S processor. 


Product revision status 


The rnpn identifier indicates the revision status of the product described in this manual, 


where: 
rn Identifies the major revision of the product. 
pn Identifies the minor revision or modification status of the product. 


Intended audience 


This manual is written for system designers, system integrators, and programmers who 
are designing or programming a System-on-Chip (SoC) that uses the ARM926EJ-S 
processor. 


Using this manual 
This document 1s organized into the following chapters: 


Chapter 1 Introduction 
Read this for an overview of the ARM926EJ-S processor. 


Chapter 2 Programmer's Model 


Read this for details of the programmer”'s model and ARM926EJ-S 
registers. 


Chapter 3 Memory Management Unit 


Read this for details of the Memory Management Unit (MMU) and 
address translation process and how to use the CP15 register to enable 
and disable the MMU. 


Chapter 4 Caches and Write Buffer 
Read this for a description of the instruction cache, the data cache, the 
write buffer, and the physical address tag RAM. 

Chapter 5 Tightly-Coupled Memory Interface 


Read this for a description of the Tightly-Coupled Memory (TCM) 
interface and how to use the CP15 region register to enable and disable 
the caches. It includes examples on how various RAM types can be 
connected. 
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Chapter 6 Bus Interface Unit 
Read this for a description of the Bus Interface Unit (BIU) interface to 
AMBA. 

Chapter 7 Noncacheable Instruction Fetches 
Read this for a description of how speculative noncacheable instruction 
fetches are used in the ARM926EJ-S processor to improve performance. 

Chapter 8 Coprocessor Interface 
Read this for a description of the coprocessor interface. The chapter 
includes timing diagrams for coprocessor operations. 

Chapter 9 Instruction Memory Barrier 
Read this for the Instruction Memory Barrier (IMB) description and how 
IMB operations are used to ensure consistency between data and 
instruction streams processed by the ARM926EJ-S processor. 

Chapter 10 Embedded Trace Macrocell Support 
Read this to understand how Embedded Trace Macrocell (ETM) 1s 
supported in the ARM926EJ-S processor. 

Chapter 11 Debug Support 
Read this for a description of the debug interface and EmbeddedICE-RT. 


Chapter 12 Power Management 
Read this for a description of the power management facilities provided 
by the ARM926EJ-S processor. 

Appendix A Signal Descriptions 
Read this for a description of the ARM926EJ-S processor signals in 
functional groups. 

Appendix B CP15 Test and Debug Registers 


Read this for detailed information on the registers used for test and debug. 


Glossary Read this for definitions of terms used in this book. 


Conventions that this manual can use are described 1n: 
o Typographical on page xx 
o Timing diagrams on page xx 


o Signals on page xx1 
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o Numbering on page xxIl. 


Typographical 
The typographical conventions are: 


Italic Highlights important notes, introduces special terminology, 
denotes internal cross-references, and citations. 


bold Highlights interface elements, such as menu names. Denotes 
signal names. Also used for terms 1n descriptive lists, where 
appropriate. 

monospace Denotes text that you can enter at the keyboard, such as 


commands, file and program names, and source code. 


monospace Denotes a permitted abbreviation for a command or option. You 
can enter the underlined text instead of the full command or option 
name. 


monospace Ttalic Denotes arguments to monospace text where the argument 1s to be 
replaced by a specific value. 


monospace bold Denotes language keywords when used outside example code. 


< and > Enclose replaceable terms for assembler syntax where they appear 
in code or code fragments. For example: 


MRC p15, OQ <Rd>, <CRn>, <CRm>, <Opcode 2> 


Timing diagrams 


The figure named Key to timing diagram conventions on page xx1 explains the 
components used 1n timing diagrams. Variations, when they occur, have clear labels. 
You must not assume any timing information that 1s not explicit in the diagrams. 


Shaded bus and signal areas are undefined, so the bus or signal can assume any value 
within the shaded area at that time. The actual level 1s unimportant and does not affect 
normal operation. 
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Clock | | | 


HIGH to LOW. 








Transient 

HIGH/LOW to HIGH 
Bus stable: 

Bus to high impedance 


Bus change 





High impedance to stable bus 


Key to timing diagram conventions 


Single-bit signals are sometimes shown as HIGH and LOW at the same time and they 
look similar to the bus change shown m Key to timing diagram conventions. If a 
single-bit signal 1s shown like this then 1ts value does not affect the accompanying 


description. 


Signals 


The signal conventions are: 


Signal level 


Lower-case n 
Prefix A 
Prefix AR 
Prefix AW 
Prefix B 
Prefix € 
Prefix H 
Prefix P 
Prefix R 


The level of an asserted signal depends on whether the signal is 
active-HIGH or active- LOW. Asserted means: 


o HIGH for active-HIGH signals 
o LOW for active-LOW signals. 


At the start or end of a signal name denotes an active- LOW signal. 
Denotes global Advanced eXtensible Interface (AXTI) signals. 
Denotes AXI read address channel signals. 

Denotes AXI write address channel signals. 

Denotes AXI write response channel signals. 

Denotes AXI low-power interface signals. 

Denotes Advanced High-performance Bus (AHB) signals. 
Denotes Advanced Peripheral Bus (APB) signals. 


Denotes AXI read data channel signals. 
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Prefix W Denotes AXI write data channel signals. 


Numbering 
The numbering convention 1s: 


<size in bits>'<base><number> 
This 1s a Verilog method of abbreviating constant numbers. For example: 
o h7B4 1s an unsized hexadecimal value. 
o '07654 1s an unsized octal value. 
o 8 d9 1s an eight-bit wide decimal value of 9. 


o 8 h3F 1s an eight-bit wide hexadecimal value of 0x3F. This 1s 
equivalent to bO0111111. 


o 8b1111 is an eight-bit wide binary value of bO0001111. 


Additional reading 
This section lists publications by ARM and by third parties. 


ARM provides updates and corrections to 1ts documentation. See http://www. arm.com 
for current errata sheets, addenda, and the Frequently Asked Questions list. 


ARM publications 


This manual contains information that 1s specific to the Abbreviated device name 
ARMO926EJ-S processor. See the following documents for other relevant information: 


o ARM Architecture Reference Manual (ARM DDI 0100) 

o ARM AMBA Specification (Rev 2.0) (ARM IHI 0001) 

o ARMO926EJ-S Implementation Guide (ARM DII 0015) 

o ARMO926EJ-S Test Chip Implementation Guide (ARM DXI 0131) 
o ARMOEJ-S Technical Reference Manual (ARM DDI 0222) 

o Multi-layer AHB Overview (ARM DVI 0045) 

o ETMO9 Technical Reference Manual (ARM DDI 0157). 
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Feedback 


ARM welcomes feedback on the ARM926EJ-S processor and its documentation. 


Feedback on this product 


If you have any comments or suggestions about this product, contact your supplier and 


give: 
o the product name 
o a concise explanation. 


Feedback on this manual 


If you have any comments on this manual, send an e-mail to errataQarm. com. Give: 


o the title 

o the number 

o the relevant page number(s) to which your comments apply 
o a concise explanation of your comments. 


ARM also welcomes general suggestions for additions and improvements. 
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Chapter 1 
Introduction 


This chapter introduces the ARM926EJ-S processor and 1ts features. It contains the 
following section: 


o About the ARM926EJ-S processor on page 1-2. 
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1.1 About the ARM926EJ-S processor 


The ARM926EJ-S processor 1s a member of the ARMO family of general-purpose 
microprocessors. The ARM926EJ-S processor 1s targeted at multi-tasking applications 
where full memory management, high performance, low die size, and low power are all 
important. 


The ARM926EJ-S processor supports the 32-bit ARM and 16-bit Thumb instruction 
sets, enabling you to trade off between high performance and high code density. The 
ARMD926EJ-S processor includes features for efficient execution of Java byte codes, 
providing Java performance similar to JIT, but without the associated code overhead. 


The ARM926EJ-S processor supports the ARM debug architecture and includes logic 
to assist in both hardware and software debug. The ARM926EJ-S processor has a 
Harvard cached architecture and provides a complete high-performance processor 
subsystem, including: 

o an ARM9EJ-S integer core 

o a Memory Management Unit (MMU) 

o separate instruction and data AMBA AHB bus interfaces 

o separate instruction and data TCM interfaces. 


The ARM926EJ-S processor provides support for external coprocessors enabling 
floating-point or other application-specific hardware acceleration to be added. The 
ARM926EJ-S processor implements ARM architecture v5STEJ. 


The ARM926EJ-S processor 1s a synthesizable macrocell. This means that you can 
optimize the macrocell for a particular target library, and that you can configure the 
memory system to suit your target application. You can individually configure the cache 
sizes to be any power of two between 4KB and 128KB. 


The tightly-coupled instruction and data memories are instantiated externally to the 
ARM926EJ-S macrocell, providing you with the flexibility of optimizing the memory 
subsystem for performance, power, and particular RAM type. The TCM interfaces 
enable nonzero wait state memory to be attached, in addition to providing a mechanism 
for supporting DMA. 


Figure 1-1 on page 1-3 shows the main blocks in the ARM926EJ-S processor. 
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Figure 1-1 ARM926EJ-S block diagram 


Figure 1-2 on page 1-4 and Figure 1-3 on page 1-5 show the ARM926EJ-S interfaces. 
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Clock 
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Miscellaneous 
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Debug 
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Figure 1-2 ARM926EJ-S interface diagram, part one 
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Figure 1-3 ARM926EJ-S interface diagram, part two 
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Chapter 2 
Programmer's Model 


This chapter describes the ARM926EJ-S registers mn CP15, the system control 
coprocessor, and provides information for programming the microprocessor. It contains 
the following sections: 


o About the programmer's model on page 2-2 

o Summary of ARM926EJ-S system control coprocessor (CPIS) registers on 
page 2-3 

o Register descriptions on page 2-7. 
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2.1 About the programmer's model 


The system control coprocessor (CP15) 1s used to configure and control the 
ARMO926EJ-S processor. The caches, Tightly-Coupled Memories (TCMs), Memory 
Management Unit (MMU), and most other system options are controlled using CP15 
registers. You can only access CP15 registers with MRC and MCR instructions 1n a 
privileged mode. CDP, LDC, STC, MCRR, and MRRC instructions, and unprivileged 
MRC or MCR instructions to CP15 cause the Undefined instruction exception to be 
taken. 
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2.2 Summary of ARM926EJ-S system control coprocessor (CP15) registers 


CP15 defines 16 registers. Table 2-1 shows the read and write functions of the registers. 


Table 2-1 CP15 register summary 



























































Register Reads Writes 

Ó ID code: Unpredictable 

Ó Cache type Unpredictable 

Ó TCM status? Unpredictable 

1 Control Control 

2 Translation table base Translation table base 
3 Domain access control Domain access control 
4 Reserved Reserved 

3 Data fault status? Data fault status? 

5 Instruction fault status? Instruction fault status? 
6 Fault address Fault address 

7 Cache operations Cache operations 

8 Unpredictable TLB operations 

9 Cache lockdownb Cache lockdown 

9 TCM region TCM region 

10 TLB lockdown TLB lockdown 

11 and 12 Reserved Reserved 

13 FCSE PIDa FCSE PIDa 

13 Context IDa Context IDa 

14 Reserved Reserved 

15 Test configuration Test configuration 
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- Register locations 0, 5, and 13 each provide access to more than one register. The register 
accessed depends on the value of the Opcode. 2 field. 
. Register location 9 provides access to more than one register. The register accessed depends 


on the value of the CRm field. See the register descriptions for details. 
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Al CP15 register bits that are defined and contain state are set to O by Reset except: 


The V bit is set to O at reset 1f the VINITHI signal is LOW, or 1 1f the VINITHI 
signal is HIGH. 


The B bit is set to O at reset 1f the BIGENDINTT signal is LOW, or 1 1f the 
BIGENDINTT signal is HIGH. 


The instruction TCM 1s enabled at reset 1f the INFTRAM pin 1s HIGH. This 
enables booting from the instruction TCM and sets the TTCM bit in the TTCM 
region register to 1. 


2.2.1 Addresses in an ARM926EJ-S system 


Three distinct types of address exist in an ARM926EJ-S system. Table 2-2 shows the 
address types in ARM926EJ-S processor. 


Domain 


Address type 


Table 2-2 Address types in ARM926EJ-S 


ARM9EJ-S Caches and MMU TCM and AMBA bus 


Virtual Address (VA) | Modified Virtual Address (MVA) Physical Address (PA) 


This 1s an example of the address manipulation that occurs when the ARM9EJ-S core 
requests an instruction: 


l. 
2 


The VA of the instruction 1s issued by the ARM9EJ-S core. 


The VA 1s translated using the FCSE PID value to the MVA. The Instruction 
Cache (ICache) and Memory Management Unit (MMU) detect the MVA, see 
Process ID Register cl3 on page 2-32. 


If the protection check carried out by the MMU on the MVA does not abort and 
the MVA tag 1s 1n the ICache, the instruction data 1s returned to the ARM9EJ-S 
core. 


If the protection check carried out by the MMU on the MVA does not abort, and 
the cache misses because the MVA tag 1s not 1n the cache, then the MMU 
translates the MVA to produce the PA. This address 1s given to the AMBA bus 
Interface to perform an external access. 


2.2.2 Accessing CP15 registers 


You can only access CP15 registers with MRC and MCR instructions 1n a privileged 
mode. The instruction bit pattern of the MCR and MRC instructions 1s shown im 
Figure 2-1 on page 2-5. 


2-4 
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2827 26252423 212019 16 15 121109 8 7 43 





Figure 2-1 CP15 MRC and MCR bit pattern 
The mnemonics for these instructions are: 


MCRÍcond; p15,<Opcode 1>,<Rd>,<CRn>,<CRm>,<Opcode. 2> 
MRCfcond; p15,<Opcode 1>,<Rd>,<CRn>,<CRm>,<Opcode. 2> 


Attempting to read from a write-only register, or writing to a read-only register causes 

Unpredictable results. In all instructions that access CP15: 

o The Opcode 1 field Should Be Zero except when the values specified are used to 
select the required operations. Using other values results mn Unpredictable 
behavior. 

o The Opcode 2 and CRm fields Should Be Zero except when the values specified 
are used to select the required behavior. Using other values results in 
Unpredictable behavior. 


Table 2-3 shows the terms and abbreviations used 1n this chapter. 


Table 2-3 CP15 abbreviations 


Term Abbreviation Description 

Unpredictable UNP For reads: The data returned when reading from 
this location 1s unpredictable. It can have any 
value. 


For writes: Writing to this location causes 
unpredictable behavior, or an unpredictable 
change 1n device configuration. 














Undefined UND An instruction that accesses CP15 mm the manner 
indicated takes the Undefined instruction 
exception. 

Should Be Zero SBZ When writing to this location, all bits of this field 
Should Be Zero. 

Should Be One SBO When writing to this location, all bits 1n this field 
Should Be One. 

Should Be Zero or SBZP When writing to this location, all bits of this field 

Preserved Should Be Zero or preserved by writing the same 
value that has been previously read from the same 
field. 
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In all cases, reading from, or writing any data values to any CP15 registers, including 
those fields specified as Unpredictable, Should Be One, or Should Be Zero does not 
cause any physical damage to the chip. 
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2.3 Register descriptions 


The following registers are described in this section: 


ID Code, Cache Type, and TCM Status Registers, cO 
Control Register cl on page 2-12 

Translation Table Base Register c2 on page 2-16 
Domain Access Control Register c3 on page 2-17 
Register c4 on page 2-177 

Fault Status Registers có on page 2-18 

Fault Address Register có on page 2-19 

Cache Operations Register c7 on page 2-19 

TLB Operations Register c8 on page 2-23 

Cache Lockdown and TCM Region Registers c9 on page 2-25 
TLB Lockdown Register clÔ on page 2-30 

Register cll and cl2 on page 2-32 

Process ID Register cl3 on page 2-32 

Register cl4 on page 2-34 

Test and Debug Register cl5 on page 2-34. 


2.3.1 ID Code, Cache Type, and TCM Status Registers, c0 


Register cO accesses the ID Register, Cache Type Register, and TCM Status Registers. 
Reading from this register returns the device ID, the cache type, or the TCM status 
depending on the value of Opcode 2 used: 

Opcode 2 = O ID value. 

Opcode 2 = 1 instruction and data cache type. 

Opcode 2 = 2 TCM status. 


The CRm field Should Be Zero when reading from these registers. Table 2-4 shows the 
instructions you can use to read register cO. 


Table 2-4 Reading from register cO 





Function Instruction 
Read ID code MRC p15,0,<Rd>,cO,cO, (O, 3-7) 
Read cache type MRC p15,0,<Rd>,c0,c0,1 





Read TCM status MRC p15,0,<Rd>,cO,cQ,2 


Writing to register cO 1s Unpredictable. 
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ID Code Register cO 
This 1s a read-only register that returns the 32-bit device ID code. 


You can access the ID Code Register by reading CP15 register cO with the Opcode 2 
field set to any value other than 1 or 2. For example: 


MRC p15, O, <Rd>, cQ, cO, TO, 3-7) ;returns ID 
The contents of the ID Code Register are shown mm Table 2-5. 


Table 2-5 Register 0, ID code 














Register bits Function Value 
[31:24] ASCI code of implementer trademark  0x41 
[23:20] Variant 0x0 
[19:16] Architecture (ARMv5TEJ) 0x6 
[15:4] Part number 0x926 
[3:0] Revision 0x05a 


a. The revision value can be 1n the range 0x0 to 0x5, depending on the 
layout revision you are using.. 
Cache Type Register c0 


This 1s a read-only register that contains information about the size and architecture of 
the Instruction Cache (ICache) and Data Cache (DCache) enabling operating systems 
to establish how to perform such operations as cache cleaning and lockdown. 


You can access the cache type register by reading CP15 register cO with the Opcode 2 
field set to 1. For example: 


MRC p15, O, <Rd>, cQ, cQ, 1; returns cache details 


The format of the Cache Type Register 1s shown 1n Figure 2-2. 


31 30 29 28 25 24 23 1211 0 


Figure 2-2 Cache Type Register format 


Ctype The Ctype field determines the cache type. See Table 2-6 on page 2-9. 
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S bit Specifies 1f the cache 1s a unified cache, S=0, or separate ICache and 
DCache, S=1. If S=0, the Isize and Dsize fields both describe the unified 
cache and must be identical. In the ARM926EJ-S processor, this bit 1s set 
to a 1 to denote separate caches. 


Dsize Specifies the size, line length, and associativity of the DCache, or of the 
unified cache 1f the S bit is O. 


Isize Specifies the size, length, and associativity of the ICache, or of the 
unified cache if the S bit is O. 


The Ctype field specifies 1f the cache supports lockdown or not, and how 1t 1s cleaned. 
The encoding is shown 1n Table 2-6. All unused values are reserved. 


Table 2-6 Ctype encoding 
Value Method Cache cleaning Cache lockdown 


b1110 Write-back Register 7 operations Format C2 


a. See Cache Lockdown Register c9 on page 2-25 for more details on 
Format C for cache lockdown. 


The Dsize and Isize fields in the Cache Type Register have the same format. This 1s 
shown in Figure 2-3. 


1110 9 6 5 3210 


Figure 2-3 Dsize and Isize field format 


Size The Size field determines the cache size in conjunction with the M bit. 

Assoc The Assoc field determines the cache associativity in conjunction with 
the M bit. 

M bit The multiplier bit determines the cache size and cache associativity 


values 1n conjunction with the Size and Assoc fields. If the cache 1s 
present, M must be set to O. If the cache 1s absent, M must be setto 1. For 
the ARM926EJ-S processor, M 1s always set to 0. 


Len The Len field determines the line length of the cache. 
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The size of the cache 1s determined by the Size field and the M bit. The M bitas O for 

the DCache and ICache. The Size field is bits [21:18] for the DCache and bits [9:6] for 
the ICache. The minimum size of each cache is 4KB, and the maximum size 1s 128KB. 
Table 2-7 shows the cache size encoding. 


Table 2-7 Cache size encoding (M=0) 


Size field Cache size 

















b0011 4KB 
b0100 8KB 
b0101 16KB 
bO110 32KB 
b0111 64KB 
b1000 128KB 


The associativity of the cache 1s determined by the Assoc field and the M bit. The M bit 
18 O for the DCache and ICache. The Assoc field is bits [17:15] for the DCache and bits 
[5:3] for the ICache. Table 2-8 shows the cache associativity encoding. 


Table 2-8 Cache associativity encoding (M=0) 
Assoc field  Associativity 


b010 4-way 





Other values Reserved 


The line length of the cache 1s determined by the Len field. The Len fields bits [13:12] 
for the DCache and bits [1:0] for the ICache. Table 2-9 shows the line length encoding. 


Table 2-9 Line length encoding 


Len field Cache line length 


b1IO 8 words (32 bytes) 





Other values Reserved 


The cache type register values for aa ARM926EJ-S processor with the following 
configuration are shown in Table 2-10 on page 2-11: 


o separate instruction and data caches 
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o DCache size = 8KB, ICache size = 16KB 


o associativity = 4-way 

o line length = eight words 

o caches use write-back, register 7 for cache cleaning, and Format C for cache 
lockdown. 


See Cache Lockdown Register c9 on page 2-25 for more details on Format €C for cache 






































lockdown. 
Table 2-10 Example Cache Type Register format 
Function Register bits Value 
Reserved [31:29] b000 
Ctype [28:25] b1110 
S [24] bl = Harvard cache 
Dsize Reserved 23:22] bOO 
Size 21:18] b0100 = 8KB 
Assoc [17:15] b010 = 4-way 
M [14] bO 
Len [13:12] b10 = 8 words per line (32 bytes) 
Isize Reserved [11:10] bOO 
Size [9:6] b0101 = 16KB 
Assoc [5:3] b010 = 4-way 
M 2] bO 
Len [1:0] b10 = 8 words per line (32 bytes) 


TCM Status Register cO 


This s a read-only register that enables operating systems to establish 1f TCM memories 
are present. See also TCM Region Register c9 on page 2-28. 


You can access the TCM Status Register by reading CP15 register cO with the Opcode 2 


field set to 2. For example: 


MRC p15,0,<Rd>,cQ,cQ0,2 ;returns TCM details 
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The format of the TCM Status Register 1s shown in Figure 2-4. 


91 17 16 15 1 0 
SBZ/UNP E SBZ/UNP E 
| | 
DTCM ITCM 
present present 


Figure 2-4 TCM Status Register format 


2.3.2 Control Register c1 


Register cl 1s the Control Register for the ARM926EJ-S processor. This register 
specifies the configuration used to enable and disable the caches and MMU. Itis 
recommended that you access this register using a read-modify-write sequence. 


For both reading and writing, the CRm and Opcode 2 fields Should Be Zero. To read 
and write this register, use the instructions: 


MRC p15, O, <Rd>, cl, cO, O ; read control register 
MCR p15, O, <Rd>, cl, cO, O ; write control register 


All defined control bits are set to zero on reset except the V bit and the B bit. The V bit 
Is set to zero at reset 1f the VINITHI signal is LOW, or one 1f the VINTTHI signal is 
HIGH. The B bitis set to zero at reset 1f the BIGENDINTT signal is LOW, or one 1f the 
BIGENDINTT signal is HIGH. 


Figure 2-5 shows the format of the Control Register. 


19181716 1514131210 00 76 3210 





Figure 2-5 Control Register format 
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Table 2-11 describes the functions of the Control Register bits. 


Bit 


[31:19] 


Name 


Table 2-11 Control bit functions register c1 


Function 


Reserved. When read returns an Unpredictable value. When written 
Should Be Zero, or a value read from bits [31:19] on the same processor. 
Using a read-modify-write sequence when modifying this register 
provides the greatest future compatibility. 





Reserved, SBO. Read = 1, write = 1. 





Reserved, SBZ. Read = 0, write = 0. 





Reserved, SBO. Read = 1, write = 1. 





L4 bit 


Determines 1f the T bit 1s set when load instructions change the PC: O = 
loads to PC set the T bit 1 = loads to PC do not set T bit (ARM v4 
behavior). For more details see the ARM Architecture Reference Manual. 





[14] 


RR bit 


Replacement strategy for ICache and DCache: O = Random replacement 
1 = Round-robin replacement. 





[13] 


V bit 


Location of exception vectors: 

O = Normal exception vectors selected, address range = 0x0000 0000 to 
0x0000 001C 

1 = High exception vectors selected, address range = 0xFFFF 0000 to 
OxFFFF 001C. Set to the value of VINITHI on reset. 





[12] 


Ibit 


ICache enable/disable: O = ICache disabled 1 = ICache enabled. 





[11:10] 


SBZ. 





[9] 


R bit 


ROM protection. 


This bit modifies the ROM protection system. See Domain access 
control on page 3-23. 





S bit 


System protection. 


This bit modifies the MMU protection system. See Domain access 
control on page 3-23. 





B bit 


Endianness: O = Little-endian operation 1 = Big-endian operation. Set to 
the value of BIGENDINTT on reset. 





ARM DDI 0198E 


[6:3] 


Copyright O 2001-2008 ARM Limited. All rights reserved. 


Reserved. SBO. 


Programmer's Model 


Table 2-11 Control bit functions register c1 (continued) 





Bit Name Function 
[2] C bit DCache enable/disable: O = Cache disabled 1 = Cache enabled. 
[1] ÁÀ bit Alignment fault enable/disable: O = Data address alignment fault 


checking disabled 1 = Data address alignment fault checking enabled. 





[0] M bit MMU enable/disable: O = disabled 1 = enabled. 


Effects of Control Register on caches 


The bits of the Control Register that directly affect the ICache and DCache behavior are: 


o the M bit 

º the C bit 

o the 1 bit 

o the RR bit. 


Assuming that TCM regions are disabled, the caches behave as shown in Table 2-12. 


Table 2-12 Effects of Control Register on caches 








Cache MMU Behavior 
ICache disabled Enabledor Allinstruction fetches are from external memory (AHB). 
disabled 
ICache enabled Disabled All instruction fetches are cacheable, with no protection checks. All addresses are flat 
mapped. Thatis VA = MVA = PA. 
ICache enabled Enabled Instruction fetches are cacheable or noncacheable, and protection checks are 


performed. All addresses are remapped from VA to PA, depending on the MMU page 
table entry. That 1s, VA translated to MVA, MVA remapped to PA. 





DCache disabled Enabledor | All data accesses are to external memory (AHB). 
disabled 





DCache enabled Disabled All data accesses are noncacheable nonbufferable. All addresses are flat mapped. That 
is VA=MVA=PA. 





DCache enabled Enabled All data accesses are cacheable or noncacheable, and protection checks are performed. 
All addresses are remapped from VA to PA, depending on the MMU page table entry. 
That 1s, VA translated to MVA, MVA remapped to PA. 
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If either the DCache or the ICache is disabled, then the contents of that cache are not 
accessed. If the cache 1s subsequently re-enabled, the contents are unchanged. To 
guarantee that memory coherency 1s maintained, the DCache must be cleaned of dirty 
data before it is disabled. 


Effects of the Control Register on TCM interface 


The M bit of the Control Register, when combined with the En bit in the respective TCM 
region register c9, directly affects the TCM interface behavior, as shown 1n Table 2-13. 


Table 2-13 Effects of Control Register on TCM interface 














TCM MMU Cache Behavior 

Instruction Disabled ICache All instruction fetches are from the external memory (AHB). 

TCM disabled disabled 

Instruction Disabled | ICache All instruction fetches are from the TCM interface, or from external memory 

TCM enabled disabled '(AHB), depending on the setting of the base address im the instruction TCM 
region register. No protection checks are made. All addresses are flat mapped. 
That is, VA = MVA= PA. 

Instruction Disabled  ICache All instruction fetches are from the TCM interface, or from the ICache, 

TCM enabled enabled ' depending on the setting of the base address in the Instruction TCM region 
register. No protection checks are made. All addresses are flat mapped. That as, 
VA = MVA= PA. 

Instruction Enabled | ICache All instruction fetches are from the TCM interface, or from the ICache/AHB 

TCM enabled enabled | interface, depending on the setting of the base address 1n the Instruction TCM 
region register. Protection checks are made. All addresses are remapped from 
VA to PA, depending on the page entry. That is, the VA 1s translated to an MVA, 
and the MVA 1s remapped to a PA. 

Data TCM Disabled  DCache All data accesses are to external memory (AHB). 

disabled disabled 
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Table 2-13 Effects of Control Register on TCM interface (continued) 








TCM MMU Cache Behavior 

Data TCM Disabled  DCache | Alldata accesses are to the TCM interface, or to the external memory, depending 

enabled disabled on the setting of the base address 1n the data TCM region register. No protection 
checks are made. All addresses are flat mapped. That is, VA = MVA= PA. 

Data TCM Disabled ' DCache All data accesses are to the TCM interface or to external memory, depending on 

enabled enabled the setting of the base address 1n the data TCM region register. All addresses are 
flat mapped. That is, VA =MVA = PA. 

Data TCM Enabled ' DCache All data accesses are either from the TCM interface, or from the DCache/AHB 

enabled enabled | interface, depending on the setting of the base address 1n the data TCM region 


register. Protection checks are made. All addresses are remapped from VA to PA, 
depending on the page entry. That 1s the VA 1s translated to an MVA, and the 
MVA is remapped to a PA. 


Note 


Read accesses on the TCM interface are not prevented when an ARM9EJ-S processor 
memory access 1s aborted. All reads on the TCM interface must be treated as 
speculative. ARM926EJ-S processor write accesses that are aborted do not take place 
on the TCM interface. 





2.3.3 Translation Table Base Register c2 


Register c2 1s the Translation Table Base Register (TTBR), for the base address of the 
first-level translation table. 


Reading from c2 returns the pointer to the currently active first-level translation table in 
bits [31:14] and an Unpredictable value in bits [13:0]. 


Writing to register c2 updates the pointer to the first-level translation table from the 
value in bits [31:14] of the written value. Bits [13:0] Should Be Zero. 


You can use the following instructions to access the TTBR: 


MRC p15, O, <Rd>, c2, cO, O; read TTBR 
MCR p15, O, <Rd>, c2, cO, O; write TTBR 


The CRm and Opcode 2 fields Should Be Zero when writing to c2. 


Figure 2-6 on page 2-1'7 shows the format of the Translation Table Base Register. 
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14 13 0 


2.3.4 | Domain Access Control Register c3 


Figure 2-6 TTBR format 


Register c3 1s the Domain Access Control Register consisting of 16 two-bit fields as 


shown 1n Figure 2-7. 


31 30 29 2827 262524 23222120191817161514131211109 876543210 


2.3.5 Register c4 
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Figure 2-7 Register c3 format 


Each two-bit field defines the access permissions for one of the 16 domains, D15-DO). 


See Table 2-14. 


Reading from c3 returns the value of the Domain Access Control Register. 


Writing to c3 writes the value of the Domain Access Control Register. 


Table 2-14 Domain access control defines 











Value Meaning Description 

00 No access Any access generates a domain fault. 

01 Client Accesses are checked against the access permission bits im 
the section or page descriptor. 

10 Reserved Reserved. Currently behaves like the no access mode. 

1 Manager Accesses are not checked against the access permission 


bits so a permission fault cannot be generated. 


You can use the following instructions to access the Domain Access Control Register: 


MRC p15, O, <Rd>, c3, cO, O ; read domain access permissions 
MCR p15, O, <Rd>, c3, cO, O ; write domain access permissions 


Accessing, reading or writing, this register causes Unpredictable behavior. 
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2.3.6 Fault Status Registers c5 


2-18 


Register c5 accesses the Fault Status Registers (FSRs). The FSRs contam the source of 
the last instruction or data fault. The instruction-side FSR 1s intended for debug 
purposes only. The FSR 1s updated for alignment faults, and external aborts that occur 
while the MMU is disabled. 


The FSR accessed 1s determined by the value of the Opcode 2 field: 
Opcode 2=0 Data Fault Status Register (DFSR). 

Opcode 2=1 Instruction Fault Status Register (IFSR). 

The fault type encoding 1s listed in Table 3-9 on page 3-21. 

You can access the FSRs using the following instructions: 


MRC p15, O, <Rd>, c5, cO, O ;read DFSR 
MCR p15, O, <Rd>, c5, cO, O ;write DFESR 
MRC p15, O, <Rd>, c5, cO, 1 ;read IFSR 
MCR p15, O, <Rd>, c5, cO, 1 ;write IFSR 


The format of the Fault Status Register 1s shown in Figure 2-8. 


o O E 43 0 


Figure 2-8 FSR format 
Table 2-15 shows the bit field descriptions for the FSR. 


Table 2-15 FSR bit field descriptions 
Bits Description 


[31:9]  UNP/SBZP. 





[8] Always reads as zero. Writes ignored. 





[7:4] Specifies which of the 16 domains (D15-DO) was being 
accessed when a data fault occurred. 





[3:0] Type of fault generated. See Table 2-16 on page 2-19. 
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Table 2-16 shows the encodings used for the status field in the FSR, and 1f the Domain 
field contains valid information. See Fault address and fault status registers on 
page 3-20 for details of MMU aborts. 


Table 2-16 FSR status field encoding 





Priority Source Size Status Domain 
Highest | Alignment - b00x1 Invalid 
External abort on translation First level b1100 Invalid 


Second level b1110 Valid 





Translation Section Page b0101 Invalid 
b0111 Valid 





Domain Section Page b1001 Vahd 
b1011 Valid 





Permission Section Page b1101 Vahd 
b1111 Valid 





Lowest External abort Section or page  bIOx0 Invalid 


2.3.7 Fault Address Register c6 


Register c6 accesses the Fault Address Register (FAR). The FAR contains the Modified 
Virtual Address of the access being attempted when a Data Abort occurred. The FAR 1s 
only updated for Data Aborts, not for Prefetch Aborts. The FAR 1s updated for 
alignment faults, and external aborts that occur while the MMU is disabled. 


You can use the following instructions to access the FAR: 


MRC p15, O, <Rd>, c6, cO, O ; read FAR 
MCR p15, O, <Rd>, c6, cO, O ; write FAR 


Writing c6 sets the FAR to the value of the data written. This 1s useful for a debugger to 
restore the value of the FAR to a previous state. 


The CRm and Opcode 2 fields Should Be Zero when reading or writing CP15 c6. 


2.3.8 Cache Operations Register c7 


Register c7 controls the caches and the write buffer. The function of each cache 
operation 1s selected by the Opcode 2 and CRm fields in the MCR instruction used to 
write to CP15 c7. Writing other Opcode 2 or CRm values 1s Unpredictable. 


ARM DDI 0198E Copyright O 2001-2008 ARM Limited. All rights reserved. 2-19 


Programmer's Model 


Reading from CP15 c7 is Unpredictable, with the exception of the two test and clean 
operations. See Table 2-18 on page 2-21 and Test and clean operations on page 2-23. 


You can use the following instruction to write to c7: 
MCR p15, <Opcode 1>, <Rd>, <CRn>, <CRm>, <Opcode 2> 


The cache functions, and a description of each function, provided by this register are 
listed 1n Table 2-17. 


Table 2-17 Function descriptions register c7 











Function Description 

Invalidate cache Invalidates all cache data, including any dirty data. 
Invalidate single entry using Invalidates a single cache line, discarding any dirty data. 
either index or modified virtual 

address 

Clean single data entry using Writes the specified DCache line to main memory 1f the 
either index or modified virtual | line 1s marked valid and dirty. The line 1s marked as not 
address dirty. The valid bit 1s unchanged. 

Clean and invalidate single Writes the specified DCache line to main memory 1f the 


data entry using either index or | line 1s marked valid and dirty. The line 1s marked not valid. 
modified virtual address 





Test and clean DCache Tests a number of cache lines, and cleans one of them 1f any 
are dirty. Returns the overall dirty state of the cache in bit 
30. See Test and clean operations on page 2-23. 





Test, clean, and invalidate As for test and clean, except that when the entire cache has 
DCache been tested and cleaned, it is invalidated. See Test and clean 
operations on page 2-23. 
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Table 2-17 Function descriptions register c7 (continued) 


Function Description 


Prefetch ICache line Performs an ICache lookup of the specified modified 
virtual address. If the cache misses, and the region 1s 
cacheable, a linefill is performed. 





Drain write buffer This instruction acts as an explicit memory barrier. It drains 
the contents of the write buffers of all memory stores 
occurring in program order before this instruction 1s 
completed. No instructions occurring 1n program order 
after this Instruction are executed until it completes. This 
can be used when timing of specific stores to the level two 
memory system has to be controlled, for example when a 
store to an interrupt acknowledge location has to complete 
before interrupts are enabled. 





Wait for interrupt This instruction drains the contents of the write buffers, 
puts the processor into a low-power state, and stops 1t from 
executing more instructions until an interrupt, or debug 
request, occurs. When an interrupt does occur, the MCR 
instruction completes and the IRQ or FIQ handler is entered 
as normal. The return link in R14 irq or R14 fig contains 
the address of the MCR instruction plus eight, so that the 
typical instruction used for interrupt return (SUBS 
PC,R14,%4) returns to the instruction following the MCR. 


Table 2-18 lists the cache operation functions and the associated data and instruction 
formats for c7. 


Table 2-18 Cache operations c7 




















Function/operation Data format | Instruction 

Invalidate ICache and DCache SBZ MCR p15, O, <Rd>, c7, c7, O 
Invalidate ICache SBZ MCR p15, O, <Rd>, c7, c5, Q 
Invalidate ICache single entry (MVA) MVA MCR p15, 0, <Rd>, c7, c5, 1 
Invalidate ICache single entry (Set/Way) Set/Way MCR p15, O, <Rd>, c7, c5, 2 
Prefetch ICache line (MVA) MVA MCR p15, 0, <Rd>, c7, c13, 1 
Invalidate DCache SBZ MCR p15, O, <Rd>, c7, c6, O 
Invalidate DCache single entry (MVA) MVA MCR p15, 0, <Rd>, c7, c6, 1 
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Table 2-18 Cache operations c7 (continued) 


























Function/operation Data format | Instruction 

Invalidate DCache single entry (Set/Way) Set/Way MCR p15, 0, <Rd>, c7, c6, 2 
Clean DCache single entry (MVA) MVA MCR p15, 0, <Rd>, c7, cl0, 1 
Clean DCache single entry (Set/Way) Set/Way MCR p15, 0, <Rd>, c7, cl0, 2 
Test and clean DCache - MRC p15, 0, <Rd>, c7, clQ, 3 
Clean and invalidate DCache entry (MVA) MVA MCR p15, 0, <Rd>, c7, cl4, 1 
Clean and invalidate DCache entry (Set/Way) Set/Way MCR p15, O, <Rd>, c7, cl4, 2 
Test, clean, and invalidate DCache - MRC p15, O, <Rd>, c7, cl4, 3 
Drain write buffer SBZ MCR p15, O, <Rd>, c7, clQ, 4 
Wait for interrupt SBZ MCR p15, 0, <Rd>, c7, cQO, 4 


The MVA format for Rd for the CP15 c7 MCR operations is shown 1n Figure 2-9. The 
Tag, Set, and Word fields define the MVA. For all of the cache operations, Word Should 
Be Zero. 


31 S+5 S+4 o 


4 210 
Tag Set (= index) 


Figure 2-9 Register c7 MVA format 


The Set/Way format for Rd for the CP15 c7 MCR operations 1s shown 1n Figure 2-10 
on page 2-23, where A and S are the base-two logarithms of the associativity and the 
number of sets. The Set, Way, and Word fields define the format. For all of the cache 
operations, Word Should Be Zero. 


For a 16KB cache, 4-way set associative, 8-word line, then: 
o A = logo» associativity = log24 = 2 
o S = log, NSETS where: 
NSETS= cache size in bytes/associativity/line length mn bytes: 
NSETS= 16384/4/32 = 128 
Therefore: 
S = log, 128 =7 
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31 32-A31-A S+5 S+4 o 4 210 


Way SBZ Set (= index) 


Figure 2-10 Register c7 Set/Way format 


Test and clean operations 


The test and clean DCache instruction provides an efficient way to clean the entire 
DCache using a simple loop. The test and clean DCache instruction tests a number of 
lines in the DCache to determine 1f any of them are dirty. If any dirty lines are found, 
then one of those lines 1s cleaned. The test and clean DCache instruction also returns the 
status of the entire DCache in bit 30. 





Note 


The test and clean DCache instruction, MRC p15, O, r15, c7, clQ0, 3,1s a special 
encoding that uses rl5 as a destination operand. However, the PC 1s not changed by 
using this instruction. This MRC instruction also sets the condition code flags. 


If the cache contains any dirty lines, bit 30 1s set to O. If the cache contains no dirty lines, 
bit 30 1s set to 1. This means that you can use the following loop to clean the entire 
DCache: 


tc loop: MRC p15, O, r15, c7, cl0, 3 * test and clean 
BNE tc. loop 


The test, clean, and invalidate DCache instruction 1s the same as test and clean DCache, 
except that when the entire cache has been cleaned, 1t 1s invalidated. This means that 
you can use the following loop to clean and invalidate the entire DCache: 


tci. loop:  MRC p15, O, ri5, c7, cl4, 3 * test clean and Invalidate 
BNE tci. loop 


2.3.9  TLB Operations Register c8 
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This is a write-only register used to control the Translation Lookaside Buffer (TLB). 
There 1s a single TLB used to hold entries for both data and instructions. The TLB 1s 
divided into two parts: 


o a set-associative part 
o a fully-associative part. 
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The fully-associative part, also referred to as the lockdown part of the TLB, 1s used to 
store entries to be locked down. Entries held mm the lockdown part of the TLB are 
preserved during an invalidate TLB operation. Entries can be removed from the 
lockdown TLB using an invalidate TLB single entry operation. 


Six TLB operations are defined, and the function to be performed 1s selected by the 
Opcode 2 and CRm fields m the MCR instruction used to write CP15 c8. Writing other 
Opcode 2 or CRm values is Unpredictable. Reading from this register 1s Unpredictable. 


You can use the instructions shown in Table 2-19 to perform TLB operations. 


Table 2-19 Register c8 TLB operations 


ARMv4/ARMv5 operation ARM926EJ-S operation Data | Instruction 


Invalidate TLB 


Invalidate set-associative TLB  SBZ MCR p15, O, <Rd>, c8, c7, O 





Invalidate TLB single entry (MVA) Invalidate single entry MVA MCR p15, O, <Rd>, c8, c7, 1 





Invalidate instruction TLB 


Invalidate set-associative TLB  SBZ MCR p15, O, <Rd>, c8, c5, O 





Invalidate instruction TLB 


single entry (MVA) | Invalidate single entry MVA MCR p15, O, <Rd>, c8, c5, 1 





Invalidate data TLB 


Invalidate set-associative TLB  SBZ MCR p15, O, <Rd>, c8, c6, Q 





Invalidate data TLB single entry (MVA) Invalidate single entry MVA MCR p15, O, <Rd>, c8, c6, 1 


31 


Those instructions that are intended to be used with dual TLB implementations, such as 
the ARM920T core or the ARM1020T core, apply to any entry, regardless of the type 
of access that caused the entry to be loaded into the TLB. See the ARM Architecture 
Reference Manual. 


The invalidate TLB operations invalidate all the unpreserved entries im the TLB. The 
invalidate TLB single entry operations invalidate any TLB entry corresponding to the 
Modified Virtual Address given mn Rd, regardless of its preserved state. See TLB 
Lockdown Register cl0 on page 2-30 for a description of how to preserve entries in the 
TLB. 


Figure 2-11 shows the Modified Virtual Address format used for invalidate TLB single 
entry operations. 


10 9 0 


Modified virtual address SBZ 


2-24 


Figure 2-11 Register c8 MVA format 
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Note 


If either small or large pages are used, and these pages contain subpage access 
permissions that are different, then you must use four invalidate TLB single entry 
operations, with the MVA set to each subpage, to invalidate all information related to 
that page held in a TLB. 





2.3.10 Cache Lockdown and TCM Region Registers c9 
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Register c9 accesses the Cache Lockdown and TCM Region Registers. The register 
accessed 1s determined by the value of the CRm field: 


CRm=c0  selects the Cache Lockdown Register 
CRm=cl selects the TCM Region Register. 


Other values of CRm are reserved. 


Cache Lockdown Register c9 


The Cache Lockdown Register uses a cache-way-based locking scheme, Format C, that 
enables you to control each cache way independently. 


These registers enable you to control which cache ways of the four-way cache are used 
for the allocation on a linefill. When the registers are defined, subsequent linefills are 
only placed mm the specified target cache way. This gives you some control over the 
cache pollution caused by particular applications, and provides a traditional lockdown 
operation for locking critical code into the cache. 


A locking bit for each cache way determines 1f the normal cache allocation 1s allowed 
to access that cache way. See Table 2-21 on page 2-26. 


A maximum of three cache ways of the four-way associative cache can be locked, 
ensuring that normal cache line replacement 1s performed. 


Note 
If no cache ways have L bits set to O, then cache way 3 1s used for all lmefills. 





The first four bits of this register determine the L bit for the associated cache way. The 
Opcode 2 field of the MRC or MCR instruction determines whether the instruction or 
data lockdown register 1s accessed: 


Opcode 2=0 Selects the DCache lockdown register. 


Opcode 2=1 Selects the ICache lockdown register. 
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You can use the instructions shown in Table 2-20 to access the Cache Lockdown 
Register. 


Table 2-20 Cache Lockdown Register instructions 


Function Data | Instruction 


Read DCache Lockdown Register Lbits MRC p15,0,<Rd>,c9,c0,0 





Write DCache Lockdown Register Lbits MCR p15,0,<Rd>,c9,c0,0 





Read ICache Lockdown Register L bits MRC p15,0,<Rd>,c9,c0,1 





Write ICache Lockdown Register L bits MCR p15,0,<Rd>,c9,c0,1 
You must only modify the Cache Lockdown Register using a read-modify-write 
sequence. For example: 
MRC p15, 0, <Rn>, c9, cO, 1; 
ORR <Rn>, <Rn>, 0x01 ; 
MCR p15, 0, <Rn>, c9, cO, 1; 


This sequence sets the L bit to 1 for way O of the ICache. The format of the cache 
lockdown register c9 1s shown in Figure 2-12. 


31 16 15 43 0 


SBZ/UNP sBO L bits (cache 
ways O to 3) 


Figure 2-12 Cache Lockdown Register c9 format 


The format of the Cache Lockdown Register L bits 1s shown in Table 2-21. All cache 
ways are available for allocation from reset. 


Table 2-21 Cache Lockdown Register L bits 





Bits 4-way associative Notes 
[31:16] UNP/SBZP Reserved 
[15:4] OxFFF SBO 
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Table 2-21 Cache Lockdown Register L bits (continued) 





Bits 4-way associative Notes 
L bit for Way 3 Bits[3:0] are the L bits for each cache way: 
Editor Wano O = Allocation to the cache way 1s determined by the 


standard replacement algorithm (reset state) 





L bit for Way 1 1 = No allocation 1s performed to this cache way. 





L bit for Way O 


You can use the cache lockdown and cache unlock procedures described 1n: 


Specific loading of addresses into a cache way 
Cache unlock procedure on page 2-28. 


Specific loading of addresses into a cache way 


The procedure to lock down code and data into way i of a cache with N ways using 
Format € involves making 1t impossible to allocate to any cache way other than the 
target cache way: 


IR 


Ensure that no processor exceptions can occur during the execution of this 

procedure, for example by disabling interrupts. If this 1s not possible, all code and 

data used by any exception handlers must be treated as code and data as 1n steps 

2 and 3. 

If an ICache way 1s being locked down, ensure that all the code executed by the 

lockdown procedure 1s in an noncacheable area of memory, including TCM, or im 

an already locked cache way. 

Ifa DCache way 1s being locked down, ensure that all data used by the lockdown 

procedure 1s in an noncacheable area of memory, including TCM, or is in an 

already locked cache way. 

Ensure that the data/instructions that are to be locked down are in a cacheable area 

of memory. 

Ensure that the data/instructions that are to be locked down are not already in the 

cache. Use the register c7 clean and/or invalidate operations to ensure this. 

Write to register c9, CRm == 0, setting L==0 for bit i and L==1 for all other ways. 

This enables allocation to the target cache way. 

For each of the cache lines to be locked down in cache way i: 

o If a DCache 1s being locked down, use an LDR instruction to load a word 
from the memory cache line to ensure that the memory cache line 1s loaded 
into the cache. 


Copyright O 2001-2008 ARM Limited. All rights reserved. 2-21 


Programmer's Model 


2-28 


o If an ICache 1s being locked down, use the register c7 MCR prefetch ICache 
line (CRm == cl3, Opcode2 == 1) to fetch the memory cache line into the 
cache. 


8. Write to register c9, CRm == O setting L == 1 for bit i and restoring all the other 
bits to the values they had before the lockdown routine was started. 


Cache unlock procedure 


To unlock the locked down portion of the cache, write to register c9 setting L == O for 
the appropriate bit. For example, the following sequence sets the L bit to O for way O of 
the ICache, unlocking way O: 


MRC p15, 0, <Rn>, c9, cd, 1; 
BIC <Rn>, <Rn>, 0x012 ; 
MCR p15, 0, <Rn>, c9, cd, 1; 


TCM Region Register c9 


The ARM926EJ-S processor supports physically-indexed, physically-tagged TCM. 
The TCM Region Register supports one region of instruction TCM and one region of 
data TCM. The minimum size of TCM region that can be supported 1s 4KB. The TCM 
Status Register Indicates 1f TCM memories are attached. See TCM Status Register cO 
on page 2-11. The size of each TCM region 1s defined by the DRSIZE and IRSIZE 
input pins. 


The data TCM 1s always disabled at reset. The instruction TCM 1s enabled at reset 1f the 
INITRAM pin is HIGH. This enables booting from the instruction TCM and sets the 
ITCM enable bit in the TTCM region register. You can use the TCM Region Register 
instructions listed in Table 2-22. 


Table 2-22 TCM Region Register instructions 





Function Data Instruction 
Read data TCM Region Register Base address  MRC p15,0,<Rd>,c9,c1,0 
Write data TCM Region Register Base address  MCR p15,0,<Rd>,c9,c1,0 





Read instruction TCM Region Register Base address  MRC p15,0,<Rd>,c9,cl,1 





Write instruction TCM Region Register Base address  MCR p15,0,<Rd>,c9,c1,1 


The TCM Region Register format 1s shown 1n Figure 2-13 on page 2-29. 
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9 1211 6 5 210 
Base address (physical address) SBZ/UNP se fo) 
| 

Enable 


Figure 2-13 TCM Region Register c9 format 
Table 2-23 shows the bit assignments for the TCM Region Register. 


Table 2-23 TCM Region Register c9 
Bits Function 


[31:12] Base address (physical address). 





[11:6] SBZ/UNP. 





[5:2] Size. The Size field reflects the value 
of the IRSIZE/DRSIZE macrocell 
inputs. The Size field encoding 1s 
shown in Table 2-24. 








[1] SBZ/UNP 

[0] Enable bit: 
O = disabled 
1 = enabled. 


Table 2-24 TCM Size field encoding 


Memory size Value 


OKB/absent b0000 























Reserved b0001, b0O010 
4KB b0011 
8KB b0100 
I6KB b0101 
32KB bO110 
64KB b0111 
128KB b1000 
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Table 2-24 TCM Size field encoding (continued) 


Memory size Value 











256KB b1001 
SI2KB b1010 
IMB b1011 
Reserved b1100, b1101, 


b1110,bl111 


If either the data or instruction TCM 1s disabled, then the contents of the respective 
TCM are not accessed. If the TCM 1s subsequently re-enabled, the contents have not 
been changed by the ARM926EJ-S processor. 


For a Harvard arrangement, the instruction-side TCM must be accessible for both reads 
and writes during normal operation, and for loading code, or for debug activity. This 
enables accesses to literal pools, undefined instruction emulation, and parameter 
passing for SWI operations. You must insert an Instruction Memory Barrier (IMB) 
between a write to the instruction TCM and the instructions being read from the 
instruction TCM. See Chapter 9 Instruction Memory Barrier for more details. 


Note 


Instruction fetches from the data TCM are not possible. An attempt to fetch an 
instruction from an address in the data TCM space does not result in an access to the 
data TCM, and the instruction 1s fetched from mam memory. These accesses can result 
in external aborts, because the address range might not be supported in main memory. 





The instruction TCM must not be programmed to the same base address as the data 
TCM. If the two TCMs are of different sizes, the regions 1n physical memory must not 
overlap. If they do overlap, 1t 1s Unpredictable which memory 1s accessed. 


Note 
The base address value setting must be aligned to the TCM size. 





2.3.11  TLB Lockdown Register c10 


The TLB Lockdown Register controls where hardware page table walks place the TLB 
entry, in the set associative region or the lockdown region of the TLB, and 1f in the 
lockdown region, which entry 1s written. The lockdown region of the TLB contains 
eight entries. See TLB structure on page 3-30 for a description of the structure of the 
TLB. 
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Writing the TLB Lockdown Register with the preserve bit (P bit) set to: 


1 Means subsequent hardware page table walks place the TLB entry 1n the 
lockdown region at the entry specified by the victim, 1n the range O to 7. 


0 Means subsequent hardware page table walks place the TLB entry 1n the 
set associative region of the TLB. 


TLB entries 1n the lockdown region are preserved so that invalidate TLB operations 
only invalidate the unpreserved entries in the TLB. That is, those 1n the set-associative 
region. Invalidate TLB single entry operations invalidate any TLB entry corresponding 
to the Modified Virtual Address given in Rd, regardless of their preserved state. That s, 
1f they are 1n the lockdown or set-associative regions of the TLB. See TLB Operations 
Register cS on page 2-23 for a description of the TLB invalidate operations. 


The instructions you can use to program the TLB Lockdown Register are shown mm 
Table 2-25. 


Table 2-25 Programming the TLB Lockdown Register 


Function Instruction 


Read data TLB lockdown victim MRC p15,0,<Rd>,c10,c0,0 





Write data TLB lockdown victim — MCR p15,0,<Rd>,c10,c0,0 


Figure 2-14 shows the TLB Lockdown Register format. 


2926: 2028 10 


SBZ SBZ/UNP - 


Figure 2-14 TLB Lockdown Register format 


The victim automatically increments after any table walk that results in an entry being 
written into the lockdown part of the TLB. 





Note 

Itis not possible for a lockdown entry to entirely map either small or large pages, unless 
all the subpage access permissions are identical. Entries can still be written into the 
lockdown region, but the address range that is mapped only covers the subpage 
corresponding to the address that was used to perform the page table walk. 


Example 2-1 on page 2-32 1s a code sequence that locks down an entry to the current 
victim. 
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Example 2-1 Lock down an entry to the current victim 


ADR r1,LockAddr * Set rl to the value of the address to be locked down 
MCR p15,0,r1,c8,c7,1 invalidate TLB single entry to ensure that 
LockAddr 15 not already in the TLB 


MRC p15,0,r0,cl0,c0,0 read the lockdown register 


ORR r0,rQ,41 set the preserve bit 
MCR p15,0,r0,c10,c0,0 write to the lockdown register 
LDR ri, [r1] TLB w7 ll miss, and entry will be loaded 


MRC p15,0,r0,ci0,c0,0 read the lockdown register (victim will have 
iIncremented) 
clear preserve bit 


write to the lockdown register 


BIC r0,r0,41 
MCR p15,0,r0,c10,c0,0 


2.3.12 Register c11 and c12 


Accessing, reading or writing, these registers causes Unpredictable behavior. 


2.3.13 Process ID Register c13 


2-32 


Register cl3 accesses the process identifier registers. The register accessed depends on 
the value of the Opcode 2 field: 


Opcode 2=0 Selects the Fast Context Switch Extension (FCSE) Process 
Identifier (PID) Register. 


Opcode 2=1 Selects the Context ID Register. 

You can use the process ID register to determine the process that 1s currently running. 
The process 1dentifier 1s set to O at reset. 

FCSE PID Register 


Addresses issued by the ARM9EJ-S core 1n the range O to 32MB are translated im 
accordance with the value contained 1n this register. Address A becomes A + (FCSE 
PID x 32MB). Itis this modified address that is seen by the caches, MMU, and TCM 
interface. Addresses above 32MB are not modified. The FCSE PID 1s a seven-bit field, 
enabling 128 x 32MB processes to be mapped. 


If the FCSE PID is 0, there 1s a flat mapping between the virtual addresses output by the 
ARMDEJ-S core and the modified virtual addresses used by the caches, MMU, and 
TCM interface. The FCSE PID 1s set to O at system reset. 


If the MMU is disabled, then no FCSE address translation occurs. 
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FCSE translation 1s not applied for addresses used for entry based cache or TLB 
maintenance operations. For these operations VA = MVA. 


Table 2-26 shows the ARM instructions that can be used to access the FCSE PID 
Register. 


Table 2-26 FCSE PID Register operations 
Function Data ARM Instruction 


Read FCSE PID  FCSE PID MRC p15,0,<Rd>,cl3,c0, Q 





Write FCSE PID  FCSEPID MOR p15,0,<Rd>,cl3,c0, O 


The format of the FCSE PID Register is shown 1n Figure 2-15. 


25 24 0 


FCSE PID SBZ 


Figure 2-15 Process ID Register format 


Performing a fast context switch 


You can perform a fast context switch by writing to CP15 register cl3 with Opcode 2 
= 0. The contents of the caches and the TLB do not have to be flushed after a fast context 
switch because they still hold valid address tags. The two Instructions after the FCSE 
PID has been written have been fetched with the old FCSE PID, as the following code 
example shows: 


tFCSE PID = 0) 


MOV rQO, &1:SHL:25 “Fetched with FCSE PID = Q 
MCR p15,0,r0,c13,c0,0 :sFetched with FCSE PID = Q 
Al “Fetched with FCSE PID = Q 
A? “Fetched with FCSE PID = Q 
A3 “Fetched with FCSE PID = 1 


Where Al, A2, and A3 are the three instructions following the fast context switch. 


Context ID Register 


The Context ID Register provides a mechanism to enable real-time trace tools to 
identify the currently executing process in multi-tasking environments. 
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The contents of this register are replicated on the ETMPROCID pins of the 
ARM926EJ-S processor. ETMPROCIDWR is pulsed when a write occurs to the 
Context ID Register. 


Table 2-27 shows the ARM instructions that you can use to access the Context ID 
Register. 


Table 2-27 Context ID register operations 
Function Data ARM Instruction 


Read context ID Context ID  MRC p15,0,<Rd>,cl3,c0, 1 





Write context ID Context ID | MCR p15,0,<Rd>,cl3,c0, 1 


The format of the Context ID Register, Rd, transferred during this operation 1s shown 
in Figure 2-16. 


31 0 


Context identifier 


Figure 2-16 Context ID Register format 


2.3.14 Register c14 


Accessing, reading or writing, this register 1s reserved. 


2.3.15 Test and Debug Register c15 


You can use register c15 to provide device-specific test and debug operations im 
ARM926EJ-S processors. Appendix B CP15 Test and Debug Registers describes the 
registers and functions available using CP15 cl5. This register 1s defined to be reserved 
for implementation-defined purposes 1n the ARM Architecture Reference Manual. If 
you write software that uses the device-specific facilities provided by c15, then this 
software 1s unlikely to be either backwards or forwards compatible. 
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Memory Management Unit 


This chapter describes the Memory Management Unit (MMU). It contains the following 


sections: 

o About the MMU on page 3-2 

o Address translation on page 3-5 

o MMU faults and CPU aborts on page 3-20 
o Domain access control on page 3-23 


o Fault checking sequence on page 3-25 
o External aborts on page 3-28 
o TLB structure on page 3-30. 
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3.1 About the MMU 


3-2 


The ARM926EJ-S MMU is an ARM architecture v5 MMU. It provides virtual memory 
features required by systems operating on platforms such as Symbian OS, WindowsCE, 
and Linux. A single set of two-level page tables stored in main memory 1s used to 
control the address translation, permission checks, and memory region attributes for 
both data and instruction accesses. 


The MMU uses a single unified Translation Lookaside Buffer (TLB) to cache the 
information held im the page tables. 


To support both sections and pages, there are two levels of address translation. The 
MMU puts the translated physical addresses into the MMU Translation Lookaside 
Buffer TLB. 


The MMU TLB has two parts: 
o the main TLB 
o the lockdown TLB. 


The main TLB 1s a two-way, set-associative cache for page table information. It has 32 
entries per way for a total of 64 entries. The lockdown TLB 1s an eight-entry 
fully-associative cache that contains locked TLB entries. Locking TLB entries can 
ensure that a memory access to a given region never incurs the penalty of a page table 
walk. For more details of the TLBs see TLB structure on page 3-30. 


The MMU features are: 


o standard ARM architecture v4 and v5 MMU mapping sizes, domains, and access 
protection scheme 


o mapping sizes are IMB (sections), 64KB (large pages), 4KB (small pages), and 
IKB (tiny pages) 

o access permissions for large pages and small pages can be specified separately for 
each quarter of the page (subpage permissions) 

o hardware page table walks 

o invalidate entire TLB using CP15 c8 

o invalidate TLB entry selected by MVA, using CP15 c8 

o lockdown of TLB entries using CP15 cl0. 


The following subsections are: 
o Access permissions and domains on page 3-3 
o Translated entries on page 3-3 


o MMU program accessible registers on page 3-4 
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3.1.1 Access permissions and domains 


For large and small pages, access permissions are defined for each subpage, 1KB for 
small pages, 16KB for large pages. Sections and tiny pages have a single set of access 
permissions. 


All regions of memory have an associated domain. A domain 1s the primary access 
control mechanism for a region of memory. It defines the conditions necessary for an 
access to proceed. The domain determines 1f: 


o access permissions are used to qualify the access 
o the access 1s unconditionally allowed to proceed 
o the access 1s unconditionally aborted. 


In the latter two cases, the access permission attributes are ignored. 


There are 16 domains. These are configured using the domain access control register. 
See Domain Access Control Register c3 on page 2-17. 


3.1.2 Translated entries 


ARM DDI 0198E 


The main TLB caches 64 translated entries. If, during a memory access, the main TLB 
contains a translated entry for the MVA, the MMU reads the protection data to 
determine 1f the access 1s permitted: 


o If access 1s permitted and an off-chip access 1s required, the MMU outputs the 
appropriate physical address corresponding to the MVA 


o If access 1s permitted and an off-chip access 1s not required, the cache or TCM 
services the access 


o 1f access 1s not permitted, the MMU signals the CPU core to abort. 


If the TLB misses because 1t does not contain an entry for the MVA, the translation table 
walk hardware is invoked to retrieve the translation information from a translation table 
in physical memory. When retrieved, the translation information 1s written into the 
TLB, possibly overwriting an existing value. 


To enable use of TLB locking features, the location to be written can be specified using 
CP15 cl0 TLB Lockdown Register. 


At reset the MMU 1s turned off, no address mapping occurs, and all regions are marked 
as noncacheable and nonbufferable. 
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MMU program accessible registers 


Table 3-1 shows the CP15 registers that are used im conjunction with page table 
descriptors stored in memory to determine the operation of the MMU. 


Table 3-1 MMU program-accessible CP15 registers 




















Register Bits Register description 

Control register M,A,S,R Contains bits to enable the MMU (M bit), enable data address alignment 

cl checks (A bit), and to control the access protection scheme (S bit and R 
bit). 

Translationtable [31:14] Holds the physical address of the base of the translation table 

base register c2 maintained in main memory. This base address must be on a 16KB 
boundary. 

Domain access [31:0] Comprises 16 two-bit fields. Each field defines the access control 

control register attributes for one of 16 domains (D15 to DO). 

c3 

Fault status [7:0] Indicates the cause of a Data or Prefetch Abort, and the domain number 

registers, IFSR of the aborted access, when an abort occurs. Bits [7:4] specify which of 

and DFSR, c5 the 16 domains (D15 to DO) was being accessed when a fault occurred. 
Bits [3:0] indicate the type of access being attempted. The value of all 
other bits 1s Unpredictable. The encoding of these bits 1s shown im 
Table 3-9 on page 3-21. 

Fault address [31:0] Holds the MVA associated with the access that caused the Data Abort. 

register c6 See Table 3-9 on page 3-21 for details of the address stored for each 
type of fault. The ARM9EJ-S register R14 abt holds the VA associated 
with a Prefetch Abort. 

TLB operations [31:0] This register 1s used to perform TLB maintenance operations. These are 

register c8 either invalidating all the (unpreserved) entries in the TLB, or 
invalidating a specific entry. 

TLB lockdown [28:26] and Enables specific page table entries to be locked into the TLB. Locking 


register clO 


[0] 


entries 1n the TLB guarantees that accesses to the locked page or section 
can proceed without Incurring the time penalty of a TLB miss. This 
enables the execution latency for time-critical pieces of code such as 
interrupt handlers to be minimized. 


All the CP15 MMU registers, except c8, contain state that can be read using MRC 
instructions, and written using MCR instructions. Registers c5 and c6 are also written 
by the MMU during an abort. Writing to c8 causes the MMU to perform a TLB 
operation, to manipulate TLB entries. This register is write-only. 


The CP15 registers are described im Chapter 2 Programmer's Model. 
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3.2 Address translation 
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The VA generated by the CPU core 1s converted to a Modified Virtual Address (MVA) 
by the FCSE using the value held in CP15 c13. The MMU translates MVAs into 
physical addresses to access external memory, and also performs access permission 
checking. 


The MMU table-walking hardware 1s used to add entries to the TLB. The translation 

information that comprises both the address translation data and the access permission 
data resides in a translation table located in physical memory. The MMU provides the 
logic for automatically traversing this translation table and loading entries into the TLB. 


The number of stages 1n the hardware table walking and permission checking process 
1s one or two depending on whether the address 1s marked as a section-mapped access 
or a page-mapped access. 


There are three sizes of page-mapped accesses and one size of section-mapped access. 
Page-mapped accesses are for: 


o large pages 
o small pages 
o tiny pages. 


The translation process always begins 1n the same way, with a level one fetch. A 
section-mapped access requires only a level one fetch, but a page-mapped access 
requires an additional level two fetch. 


The following subsections are: 


o Translation table base on page 3-6 
o First-level fetch on page 3-7 

o First-level descriptor on page 3-8 
o Section descriptor on page 3-10 


o Coarse page table descriptor on page 3-11 
o Fine page table descriptor on page 3-12 


o Translating section references on page 3-13 

o Second-level descriptor on page 3-13 

o Translating large page references on page 3-15 
o Translating small page references on page 3-17 
o Translating tiny page references on page 3-18. 
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3.2.1 Translation table base 


The hardware translation process 1s initiated when the TLB does not contain a 
translation for the requested MVA. The Translation Table Base Register (TTBR), CP15 
register c2, points to the base address of a table 1n physical memory that contains section 
or page descriptors, or both. The 14 low-order bits [13:0] of the TTBR are 
Unpredictable on a read, and the table must reside on a 16KB boundary. Figure 3-1 
shows the format of the TTBR. 


31 14 13 0 


Figure 3-1 Translation Table Base Register 


The translation table has up to 4096 x 32-bit entries, each describing IMB of virtual 
memory. This enables up to 4GB of virtual memory to be addressed. 


Figure 3-2 on page 3-7 shows the table walk process. 
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Level one fetch Level two fetch 
pao Section 
Section base 
TTB base 
Indexed by Indexed by E page Large page 
modified modified Ras 
virtual virtual 
address address inidexedb 
bits [31:20] bits [19:0] nd y 
virtual 
address 
4096 entries 1MB bits [15:0] 
E Coarse page 
oarse page table 
table base 64KB 
Indexed by Small page 
modified 
virtual 
address 
Indexed by 
bits [19:12] od 
virtual 
address 
256 entries bits [11:0] 
Fine page Fine page 
table base table 4KB 
Indexed by Tiny page 
modified 
virtual 
address 
, Indexed by 
bits [19:10] modified 
virtual 
address 
1024 entries bits [9:0] 
1KB 


Figure 3-2 Translating page tables 


3.2.2 First-level fetch 


Bits [31:14] of the TT'BR are concatenated with bits [31:20] of the MVA to produce a 
30-bit address as shown in Figure 3-3 on page 3-8. 
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Modified virtual address 
0 


31 20 19 


Translation table base 
14 13 


31 
31 y 14 13 2140 
Translation base Table index ojo 
y 0 


31 
First-level descriptor 


Figure 3-3 Accessing translation table first-level descriptors 





This address selects a 4-byte translation table entry. This 1s a first-level descriptor for 


either a section or a page table. 


3.2.3 First-level descriptor 
The first-level descriptor returned 1s a section descriptor, a coarse page table descriptor, 
or a fine page table descriptor, or 1s invalid. Figure 3-4 shows the format of a first-level 


descriptor. 


31 20 19 1211109 8 543210 
Fault 


Domain 1] Coarse page table 


Coarse page table base address 


Fine page table base address Domain 1]1| Fine page table 





Figure 3-4 First-level descriptor 


A section descriptor provides the base address of a IMB block of memory. 
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The page table descriptors provide the base address of a page table that contains 
second-level descriptors. There are two sizes of page table: 


o coarse page tables have 256 entries, splitting the IMB that the table describes into 
4KB blocks 


o fine page tables have 1024 entries, splitting the IMB that the table describes into 
IKB blocks. 


First-level descriptor bit assignments are shown in Table 3-2. 


Table 3-2 First-level descriptor bits 


Bits 
Description 
Section  Coarse Fine 


[31:20] [31:10] [31:12] These bits form the corresponding bits of the physical 








address. 
[19:12] - - Should Be Zero. 
[11:10] - - Access permission bits. Access permissions and domains on 


page 3-3 and Fault address and fault status registers on 
page 3-20 show how to interpret the access permission bits. 














[9] [9] [11:9] Should Be Zero. 

[8:5] [8:5] [8:5] Domain control bits. 

[4] [4] [4] Must be 1. 

[3:2] - - Bits C and B indicate whether the area of memory mapped 


by this page 1s treated as write-back cacheable, 
write-through cacheable, noncached buffered, or noncached 








nonbuffered. 
- [3:2] [3:2] Should Be Zero. 
[1:0] [1:0] [1:0] These bits indicate the page size and validity and are 


interpreted as shown in Table 3-3 on page 3-10. 
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The two least significant bits of the first-level descriptor indicate the descriptor type as 
shown in Table 3-3. 


Table 3-3 Interpreting first-level descriptor bits [1:0] 











Value Meaning Description 

00 Invalid Generates a section translation fault 

01 Coarse page table Indicates that this 1s a coarse page table descriptor 
10 Section Indicates that this 1s a section descriptor 

11 Fine page table Indicates that this 1s a fine page table descriptor 


3.2.4 Section descriptor 


A section descriptor provides the base address of a IMB block of memory. Figure 3-5 
shows the format of a section descriptor. 


31 20 19 IZ1.40 0. 6 543210 


S 
Z 


Figure 3-5 Section descriptor 


Section descriptor bit assignments are described in Table 3-4. 
Table 3-4 Section descriptor bits 
Bits Description 


[31:20] Form the corresponding bits of the physical address for a section 





[19:12] Always written as O 





[11:10] The AP bits specify the access permissions for this section 





[9] Always written as O 





[8:5] Specify one of the 16 possible domains, held 1n the domaim access control register, 
that contain the primary access controls 
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Table 3-4 Section descriptor bits (continued) 





Bits Description 
[4] Should be written as 1, for backwards compatibility 
[3:2] These bits (C and B) indicate 1f the area of memory mapped by this section 1s 


treated as write-back cacheable, write-through cacheable, noncached buffered, or 
noncached nonbuffered 





[1:0] These bits must be 10 to indicate a section descriptor 


3.2.5  Coarse page table descriptor 


A coarse page table descriptor provides the base address of a page table that contains 
second-level descriptors for either large page or small page accesses. Coarse page tables 
have 256 entries, splitting the IMB that the table describes into 4KB blocks. Figure 3-6 
shows the format of a coarse page table descriptor. 


31 10 9 8 543210 


9 
S 

Coarse page table base address B| Domain |1|SBZ 1 
Z 


Figure 3-6 Coarse page table descriptor 


Note 


If a coarse page table descriptor 1s returned from the first-level fetch, a second-level 
fetch 1s Initiated. 





Coarse page table descriptor bit assignments are described in Table 3-5. 
Table 3-5 Coarse page table descriptor bits 
Bits Description 


[31:10] These bits form the base for referencing the second-level descriptor (the coarse 
page table index for the entry 1s derived from the MVA) 





[9] Always written as O 





[8:5] These bits specify one of the 16 possible domains, held 1n the domaim access 
control registers, that contain the primary access controls 
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Bits 


[4] 


Table 3-5 Coarse page table descriptor bits (continued) 


Description 


Always written as 1 





[3:2] 


Always written as O 





[1:0] 


These bits must be 01 to indicate a coarse page table descriptor 


3.2.6 Fine page table descriptor 


A fine page table descriptor provides the base address of a page table that contains 
second-level descriptors for large page, small page, or tiny page accesses. Fine page 
tables have 1024 entries, splitting the IMB that the table describes into 1KB blocks. 
Figure 3-7 shows the format of a fine page table descriptor. 


31 


1211 543210 


Fine page table base address ERR E 





Figure 3-7 Fine page table descriptor 


Note 


If a fine page table descriptor 1s returned from the first-level fetch, a second-level fetch 


1s initiated. 


Table 3-6 shows the fine page table descriptor bit assignments. 


Bits 


[31:12] 


Table 3-6 Fine page table descriptor bits 


Description 


These bits form the base for referencing the second-level descriptor (the fine page 
table index for the entry 1s derived from the MVA) 





[11:9] 


Always written as O 





[8:5] 


These bits specify one of the 16 possible domains, held 1n the domain access control 
registers, that contain the primary access controls 





Always written as 1 





[3:2] 


Always written as O 





[1:0] 


These bits must be 11 to indicate a fine page table descriptor 
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3.2.7 | Translating section references 


Figure 3-8 shows the complete section translation sequence. 


Modified virtual address 
31 20 19 0 


Translation table base 
31 14 13 0 


Translation base 





y 14 13 2140 
Translation base Table index o 


| Section first-level descriptor 
31 20 19 1211109 8 543210 


| Physical address 
31 20 19 0 





Figure 3-8 Section translation 


3.2.8 'Second-level descriptor 


If the first-level fetch returns either a coarse page table descriptor or a fine page table 
descriptor, this provides the base address of the page table to be used. The page table 15 
then accessed and a second-level descriptor 1s returned. Figure 3-9 on page 3-14 shows 


the format of second-level descriptors. 
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1 16 15 1211109 876543210 
ojo Fault 
Small page base address ae |aez [aes [avo [o)a)1jo Small page 





Tiny page base address o relelefhr Tiny page 


Figure 3-9 Second-level descriptor 


A second-level descriptor defines a tiny, a small, or a large page descriptor, or 1s invalid: 


o a large page descriptor provides the base address of a 64KB block of memory 
o a small page descriptor provides the base address of a 4KB block of memory 
o a tiny page descriptor provides the base address of a 1KB block of memory. 


Coarse page tables provide base addresses for either small or large pages. Large page 
descriptors must be repeated in 16 consecutive entries. Small page descriptors must be 
repeated in each consecutive entry. 


Fine page tables provide base addresses for large, small, or tiny pages. Large page 
descriptors must be repeated in 64 consecutive entries. Small page descriptors must be 
repeated in four consecutive entries and tiny page descriptors must be repeated in each 
consecutive entry. 


Second-level descriptor bit assignments are described in Table 3-7. 
Table 3-7 Second-level descriptor bits 
Bits 


Description 
Large Small Tiny 


[31:16] [31:12] [31:10] These bits form the corresponding bits of the physical 


address. 





[15:12] - [9:6] Should Be Zero. 
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Table 3-7 Second-level descriptor bits (continued) 








Bits 
Description 

Large Small Tiny 

[11:4] [11:4] [5:4] Access permission bits. Domain access control on page 3-23 
and Fault checking sequence on page 3-25 show how to 
Interpret the access permission bits. 

[3:2] [3:2] [3:2] These bits, C and B, indicate whether the area of memory 
mapped by this page 1s treated as write-back cacheable, 
write-through cacheable, noncached buffered, or noncached 
nonbuffered. 

[1:0] [1:0] [1:0] These bits indicate the page size and validity and are 


interpreted as shown in Table 3-8. 


The two least significant bits of the second-level descriptor indicate the descriptor type 
as shown 1n Table 3-8. 


Table 3-8 Interpreting page table entry bits [1:0] 











Value Meaning Description 

00 Invalid Generates a page translation fault 

01 Large page Indicates that this 1s a 64KB page 

10 Small page Indicates that this 1s a 4KB page 

11 Tiny page Indicates that this 1s a 1IKB page 
Note 





Tiny pages do not support subpage permissions and therefore only have one set of 
access permission bits. 


3.2.9  Translating large page references 


Figure 3-10 on page 3-16 shows the complete translation sequence for a 64KB large 


page. 


ARM DDI 0198E Copyright O 2001-2008 ARM Limited. All rights reserved. 3-15 


Memory Management Unit 


3-16 


Modified virtual address 
31 20 19 16 15 1211 0 


L2 
Table index IG indo Page index 


Translation table base 


31 14 13 0 


Translation base 





| 


31 14 13 210 
Translation base Table index o 

| First-level descriptor 
31 10 9 8 543210 


Coarse page table base address | poman 1 nN ú 


31 10 9 


0 
Coarse page table base address L2 table index o 


| Second-level descriptor 
1 16 15 1211109 87 6543 


2 1 


| Physical address 
1 16 15 


Figure 3-10 Large page translation from a coarse page table 


-—a 





OQ 





OQ 


Because the upper four bits of the page index and low-order four bits of the coarse page 
table index overlap, each coarse page table entry for a large page must be duplicated 16 
times, in consecutive memory locations, in the coarse page table. 


If a large page descriptor 1s included 1n a fine page table, the high-order six bits of the 
page index and low-order six bits of the fine page table index overlap. Each fine page 
table entry for a large page must therefore be duplicated 64 times. 
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3.2.10 Translating small page references 


Figure 3-11 shows the complete translation sequence for a 4KB small page. 


Modified virtual address 
31 20 19 1211 0 


E Level two : 


Translation table base 
31 14 13 0 


Translation base 





y 14 13 2140 
Translation base Table index o 


| First-level descriptor 


O 


1 


10 9 8 54321 


Coarse page table base address | ema 1 


| 10 9 2 
Coarse page table base address L2 table index 


| Second-level descriptor 
121110987 6543210 


Page base address 1 o 


| Physical address 
1 1211 0 


Figure 3-11 Small page translation from a coarse page table 


(6%) 
ão 


Ea - 





td 
REM - 


(66) 
=a 





O 


If a small page descriptor 1s included 1n a fine page table, the upper two bits of the page 
index and low-order two bits of the fine page table index overlap. Each fine page table 
entry for a small page must therefore be duplicated four times. 
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3.2.11  Translating tiny page references 


Figure 3-12 shows the complete translation sequence for a 1KB tiny page. 


Modified virtual address 
31 20 19 10 9 0 


Ê Level two E 


Translation table base 
31 14 13 0 


Translation base 





| 


31 14 13 2 1 


0 
Translation base Table index o 


| First-level descriptor 
1211 98 54321 


Fine page table base address | oomano 1 HE 1 


31 1211 


0 
Fine page table base address L2 table index o 


| Second-level descriptor 


Page base address 


| Physical address 


Figure 3-12 Tiny page translation from a fine page table 


CG 
- 


REA - 





CG 
- 





Page translation involves one additional step beyond that of a section translation. The 
first-level descriptor 1s the fine page table descriptor and this 1s used to point to the 
first-level descriptor. 
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Note 

The domain specified 1n the first-level description and access permissions specified in 
the first-level description together determine whether the access has permissions to 
proceed. See section Domain access control on page 3-23 for details. 





Subpages 


You can define access permissions for subpages of small and large pages. If, during a 
page table walk, a small or large page has a different subpage permission, only the 
subpage being accessed 1s written into the TLB. For example, a 16KB (large page) 
subpage entry 1s written into the TLB 1f the subpage permission differs, and a 64KB 
entry 1s put in the TLB 1f the subpage permissions are identical. 


When you use subpage permissions, and the page entry then has to be invalidated, you 
must invalidate all four subpages separately. 
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3.3 MMU faults and CPU aborts 


The MMU generates an abort on the following types of faults: 


o alignment faults (data accesses only) 
o translation faults 

o domain faults 

o permission faults. 


In addition, an external abort can be raised by the external system. This can happen only 
for access types that have the core synchronized to the external system: 


o page walks 


o noncached reads 
o nonbuffered writes 
o noncached read-lock-write sequence (SWP). 


Alignment fault checking 1s enabled by the A bitin CP15 cl. Alignment fault checking 
1s not affected by whether or not the MMU 1s enabled. Translation, domain, and 
permission faults are only generated when the MMU 1s enabled. 


The access control mechanisms of the MMU detect the conditions that produce these 
faults. If a fault is detected as a result of a memory access, the MMU aborts the access 
and signals the fault condition to the CPU core. The MMU retains status and address 
information about faults generated by the data accesses in the data fault status register 
and fault address register. See Fault address and fault status registers. 


The MMU also retains status about faults generated by instruction fetches in the 
instruction fault status register. 


Note 


The address information for an instruction side abort is contained 1n the core link 
register rl4 abt. 





An access violation for a given memory access inhibits any corresponding external 
access to the AHB interface, with an abort returned to the CPU core. 


3.3.1 Fault address and fault status registers 
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On a Data Abort, the MMU places an encoded four-bit value, the fault status, along with 
the four-bit encoded domain number, 1n the data FSR. Similarly, on a Prefetch Abort, in 
the instruction FSR, intended for debug purposes only. In addition, the MVA associated 
with the Data Abort 1s latched into the FAR. If an access violation simultaneously 
generates more than one source of abort, they are encoded in the priority given in 
Table 3-9. The FAR 1s not updated by faults caused by instruction prefetches. 
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Fault status register (FSR) 


Table 3-9 shows the various access permissions and controls supported by the data 


MMU, and how these are interpreted to generate faults. 


Table 3-9 Priority encoding of fault status 

















Priority Source Size Status Domain 
Highest | Alignment - b00x1 Invalid 
External abort on translation | First level b1100 Invalid 
Second level bi110 Valid 
Translation Section Page b0101 Invalid 
b0111 Valid 
Domain Section Page b1001 Valid 
b1011 Valid 
Permission Section Page b1101 Valid 
bl111 Valid 
Lowest External abort Section or page  b10Ox0 Invalid 


Note 





Alignment faults can write either bO001 or b0011 into FSR[3:0]. 


Invalid values can occur 1n the status bit encoding for domain faults. This happens when 
the fault 1s raised before a valid domain field has been read from a page table 


description. 


Aborts masked by a higher priority abort can be regenerated by fixing the cause of the 


higher priority abort, and repeating the access. 


Alignment faults are not possible for instruction fetches. 


The instruction FSR can also be updated for instruction prefetch operations: 


MCR p15, O, <Rd>, c5, c0, 1 
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Fault address register (FAR) 


For load and store instructions that can involve the transfer of more than one word 
(LDM/STM, LDRD, STRD, and STC/LDC), the value written into the FAR register 
depends on the type of access, and for external aborts, on whether or not the access 
crosses a 1KB boundary. Table 3-10 shows the FAR values for multi-word transfers. 


Table 3-10 FAR values for multi-word transfers 

















Source FAR 

Alignment MVA of first aborted address in transfer. 

External abort on translation MVA of first aborted address im transfer. 

Translation MVA of first aborted address 1n transfer. 

Domain MVA of first aborted address in transfer. 

Permission MVA of first aborted address in transfer. 

External abort for noncached reads, or MVA of last address before 1KB boundary 1f any word of the transfer before 
nonbuffered writes. IKB boundary 1s externally aborted. 


MVA of last address 1n transfer 1f the first externally aborted word 1s after 
IKB boundary. 


Compatibility Issues 


To enable code to be easily ported to ARM architecture v4 or v5 MMUS, or to future 
architectures, 1t is recommended that no reliance is made on external abort behavior. 


The instruction FSR 1s intended for debugging purposes only. Code that 1s intended to 
be ported to other ARM architecture v4 or v5 MMUSs must not use the instruction FSR. 
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3.4 Domain access control 


MMU accesses are primarily controlled through the use of domains. There are 16 
domains and each has a two-bit field to define access to it. Two types of user are 


supported: 
o clients 
o managers. 


The domains are defined in the domain access control register, CP15 c3. Figure 2-7 on 
page 2-1'7 shows how the 32 bits of the register are allocated to define the 16 two-bit 
domains. 


Table 3-11 defines how the bits within each domain are interpreted to specify the access 











permissions. 
Table 3-11 Domain access control register, access control bits 
Value Meaning Description 
00 No access Any access generates a domain fault. 
01 Client Accesses are checked against the access permission bits im 
the section or page descriptor. 
10 Reserved Reserved. Currently behaves like the no access mode. 
11 Manager Accesses are not checked against the access permission 


bits so a permission fault cannot be generated. 


Table 3-12 shows how to interpret the Access Permission (AP) bits and how their 
interpretation 1s dependent on the R and S bits , Control Register cl bits [9:8]. 


Table 3-12 Interpreting access permission (AP) bits 


AP S JR  Privilegedpermissions User permissions 











00 0 O No access No access 
00 1 0 Read-only No access 
00 0 1 Read-only Read-only 
00 1 1 Unpredictable Unpredictable 
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Table 3-12 Interpreting access permission (AP) bits (continued) 


AP S JR  Privilegedpermissions User permissions 








01 XxX X Read/write No access 
IO x x Read/write Read-only 
11 X % Read/write Read/write 
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3.5 Fault checking sequence 


Section 


translation 
fault 


Section 


domain 
fault 


Section 


permission 
fault 


ARM DDI 0198E 











The sequence the MMU uses to check for access faults 1s different for sections and 
pages. The sequence for both types of access 1s shown in Figure 3-13. 


Modified virtual address 
e Alignment 
Check address alignment Misaligned fault 
Get first-level descriptor 


| Section — | Page 





Page 
Get page translation 
table entry 


fault 


No access (00) No access (00) 
Reserved (10) Cneenomamisiatis Reserved (10) 


| Section — Page 








Page 


Violation access access Violation permission 


permissions permissions fault 


Physical address 


Figure 3-13 Sequence for checking faults 





The conditions that generate each of the faults are described 1n: 
o Alignment faults on page 3-26 

o Translation faults on page 3-26 

o Domain faults on page 3-26 
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o Permission faults on page 3-27. 


3.5.1 Alignment faults 


If alignment fault checking 1s enabled (the A bitin CP15 cl 1s set), the MMU generates 
an alignment fault on any data word access 1f the address 1s not word-aligned, or on any 
halfword access 1f the address 1s not halfword-aligned, irrespective of whether the 
MMU is enabled or not. An alignment fault 1s not generated on any instruction fetch or 
any byte access. 





Note 


If an access generates an alignment fault, the access sequence aborts without reference 
to other permission checks. 


3.5.2 Translation faults 


There are two types of translation fault: 


Section A section translation fault 1s generated 1f the level one descriptor 1s 
marked as invalid. This happens 1f bits [1:0] of the descriptor are both O. 


Page A page translation fault 1s generated 1f the level one descriptor 1s marked 
as invalid. This happens 1f bits [1:0] of the descriptor are both 0. 


3.5.3 Domain faults 
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There are two types of domain fault: 


Section The level one descriptor holds the four-bit domain field that selects one 
of the 16 two-bit domains in the domain access control register. The two 
bits of the specified domain are then checked for access permissions as 
described 1n Table 3-12 on page 3-23. The domain 1s checked when the 
level one descriptor 1s returned. 


Page The level one descriptor holds the four-bit domain field that selects one 
of the 16 two-bit domains in the domain access control register. The two 
bits of the specified domain are then checked for access permissions as 
described 1n Table 3-12 on page 3-23. The domain 1s checked when the 
level one descriptor 1s returned. 


If the specified access 1s either no access (00), or reserved (10), then either a section 
domain fault or page domain fault occurs. 
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If the two-bit domain field returns 01 (client), then access permissions are checked as 


follows: 


Section 


If the level one descriptor defines a section-mapped access, the AP bits of 
the descriptor define whether or not the access 1s allowed, according to 
Table 3-12 on page 3-23. Their interpretation 1s dependent on the setting 
of the S and R bits, CP15 cl bits 8 and 9. If the access 1s not allowed, a 
section permission fault is generated. 


Large page or small page 


Tiny page 


If the level one descriptor defines a page-mapped access and the level two 
descriptor 1s for a large or small page, four access permission fields, ap3 
to apÔ, are specified, each corresponding to one quarter of the page. For 
small pages ap3 1s selected by the top 1KB of the page and ap0 is selected 
by the bottom 1KB of the page. For large pages, ap3 1s selected by the top 
16KB of the page and apO 1s selected by the bottom 16KB of the page. 
The selected AP bits are then interpreted 1n exactly the same way as for 
a section. See Table 3-12 on page 3-23. The only difference 1s that the 
fault generated 1s a page permission fault. 


If the level one descriptor defines a page-mapped access, and the level 
two descriptor 1s for a tiny page, the AP bits of the level one descriptor 
define whether or not the access 1s allowed in the same way as for a 
section. The fault generated 1s a page permission fault. 
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3.6 External aborts 


In addition to the MMU generated aborts, external aborts can be generated for certam 
types of access that involve transfers over the AHB bus. These can be used to flag errors 
on external memory accesses. However, not all accesses can be aborted 1n this way. 


The following accesses can be externally aborted: 
o page walks 


o noncached reads 
o nonbuffered writes 
o noncached read-lock-write (SWP) sequence. 


For a read-lock-write (SWP) sequence, 1f the read externally aborts, the write 1s always 
attempted. 


A swap to an NCB region is forced to have precisely the same behavior as a swap to an 
NCNB region. This means that the write part of a swap to an NCB region can be 
externally aborted. 


3.6.1 Enabling the MMU 
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Before enabling the MMU using CP15 cl you must: 


1. Program the TTB register (CP15 c2) and the domain access control register 
(CP15 c3). 
2. Program first-level and second-level page tables as required, ensuring that a valid 


translation table 1s placed in memory at the location specified by the TTB register. 


When these steps have been performed, you can enable the MMU by setting CP15 cl 
bit O HIGH. 


Care must be taken 1f the translated address differs from the untranslated address 
because several instructions following the enabling of the MMU might have been 
prefetched with the MMU off (VA = MVA = PA). 


In this case, enabling the MMU can be considered as a branch with delayed execution. 
A similar situation occurs when the MMU 1s disabled. Consider the following code 
sequence: 


MRC p15, O, R1, cl, CO, O ; Read control register 


ORR R1, £0x1 ' Set M bit 

MCR p15, 0,R1,C1, C0,0 * Write control register and enable MMU 
Fetch Flat 

Fetch Flat 


Fetch Translated 


Copyright O 2001-2008 ARM Limited. All rights reserved. ARM DDI 0198E 


Memory Management Unit 





Note 


Because the same register, CP15 cl, controls the enabling of the ICache, DCache, and 
the MMU, all three can be enabled using a single MCR instruction. 


3.6.2  Disabling the MMU 
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To disable the MMU, clear bit O in CP15 cl. 


Note 


If the MMU 1s enabled, then disabled, and subsequently re-enabled, the contents of the 
TLB are preserved. If these are now invalid, then the TLB must be invalidated before 
re-enabling the MMU. See TLB Operations Register c8 on page 2-23. 
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3.7 TLB structure 


3-30 


The MMU contains a single unified TLB used for both data accesses and instruction 
fetches. The TLB is divided into two parts: 


o an eight-entry fully-associative part used exclusively for holding locked down 
TLB entries 


o a set-associative part for all other entries, 2 way x 32 entry. 


Whether an entry 1s placed 1n the set-associative, or lockdown part of the TLB is 
dependent on the state of the TLB lockdown register, when the entry 1s written into the 
TLB. See TLB Lockdown Register cl0 on page 2-30. 


When an entry has been written into the lockdown part of the TLB, 1t can only be 
removed by being overwritten explicitly, or by an MVA-based TLB invalidate 
operation, where the MVA matches the locked down entry. 


The structure of the set-associative part of the TLB does not form part of the 
programmer's model for the ARM926EJ-S processor. No assumptions must be made 
about the structure, replacement algorithm, or persistence of entries 1n the 
set-associative part. Specifically: 


o Any entry written into the set-associative part of the TLB can be removed at any 
time. The set-associative part of the TLB must be considered as a temporary cache 
of translation/page table information. No reliance must be placed on an entry 
either residing or not residing 1n the set-associative TLB, unless that entry already 
exists 1n the lockdown TLB. The set-associative part of the TLB can contam 
entries that are defined 1n the page tables but do not correspond to address values 
that have been accessed since the TLB was invalidated. 


o The set-associative part of the TLB must be considered as a cache of the 
underlying page table, where memory coherency must be maintained at all times. 
If a level one descriptor is modified in main memory, then to guarantee coherency 
either an invalidate TLB or invalidate TLB by entry operation must be used to 
remove any cached copies of the level one descriptor. This 1s required regardless 
of the type of level one descriptor (section, level two page table reference, or 
fault). 

o If any of the subpage permissions for a given page are different, then each of the 
subpages are treated separately. To invalidate all the entries associated with a page 
with subpage permissions then four MVA-based invalidate operations are 
required, one for each subpage. 
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Caches and Write Buffer 


This chapter describes the Instruction Cache (ICache), the Data Cache (DCache), and 
the write buffer. It contains the following sections: 


o About the caches and write buffer on page 4-2 
o Write buffer on page 4-4 

o Enabling the caches on page 4-5 

o TCM and cache access priorities on page 4-7 
o Cache MVA and Set/Way formats on page 4-8. 
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About the caches and write buffer 


The ARM926EJ-S processor includes: 


an Instruction Cache (ICache) 
a Data Cache (DCache) 
a write buffer. 


The size of the caches can be from 4KB to 128KB, in power of two Increments. 


The caches have the following features: 


The caches are virtual index, virtual tag, addressed using the Modified Virtual 
Address (MVA). This enables the avoidance of cache cleaning and/or invalidating 
on context switch. 


The caches are four-way set associative, with a cache line length of eight words 
per line, 32 bytes per line, and with two dirty bits in the DCache. 


The DCache supports write-through and write-back, or copyback, cache 
operations, selected by memory region using the C and B bits im the MMU 
translation tables. 


Allocate on read-miss 1s supported. The caches perform critical-word first cache 
refilling. 


Pseudo-random or round-robin replacement, selectable by the RR bitin CP15 cl. 


Cache lockdown registers enable control over which cache ways are used for 
allocation on a linefill, providing a mechanism for both lockdown and controlling 
cache pollution. 


The DCache stores the Physical Address (PA) tag corresponding to each DCache 
entry 1n the tag RAM for use during cache line write-backs, in addition to the 
Virtual Address tag stored mn the tag RAM. This means that the MMU 1s not 
involved im DCache write-back operations, removing the possibility of TLB 
misses related to the write-back address. 


The PLD data preload instruction does not cause data cache linefills. It is treated 
as a NOP instruction. 


Cache maintenance operations to provide efficient invalidation of: 

— — the entire DCache or ICache 

— — regions of the DCache or ICache 

— — regions of virtual memory. 

They also provide operations for efficient cleaning and invalidation of: 
— — the entire DCache 

— — regions of the DCache 


— — regions of virtual memory. 
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The latter enables DCache coherency to be efficiently maintained when small 
code changes occur, for example for self-modifying code and changes to 
exception vectors. 
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4.2 Write buffer 


4-4 


The write buffer 1s used for all writes to a noncacheable, bufferable region, 
write-through region, and write misses to a write-back region. A separate buffer 15 
Incorporated 1n the DCache for holding write-back data for cache line evictions or 
cleaning of dirty cache lines. 


The main write buffer has a 16-word data buffer and a four-address buffer. 
The DCache write-back buffer has eight data word entries and a single address entry. 


The MCR drain write buffer instruction enables both write buffers to be drained under 
software control. 


The MCR wait for interrupt causes both write buffers to be drained and the 
ARM926EJ-S processor to be put into a low-power state until an interrupt occurs. 


Write buffer behavior 1s described in Table 4-4 on page 4-6. 


No forwarding takes place for read accesses that have corresponding pending writes im 
the write buffer. For such accesses the write buffer 1s dramed and the value fetched from 
external memory. 
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4.3 Enabling the caches 


On reset, the ICache and DCache entries are all invalidated and the caches are disabled. 
The caches are not accessed for reads or writes. The caches are enabled using the I, C, 
and M bits from CP15 cl, and can be enabled independently of one another. Table 4-1 
gives the I and M bit settings for the ICache, and the associated behavior. The priority 
of the TCM and cache behavior is described in TCM and cache access priorities on 
page 4-7. 


Table 4-1 CP15 c1 Il and M bit settings for the ICache 








Ibit Mbit ARM926EJ-S behavior 

O - ICache disabled. All instruction fetches are fetched from external memory (AHB). 

1 O ICache enabled, MMU disabled. All instruction fetches are cacheable, with no protection checks. All 
addresses are flat mapped, that is VA = MVA= PA. 

l 1 ICache enabled, MMU enabled. Instruction fetches are cacheable or noncacheable depending on the page 


ARM DDI 0198E 


descriptor €C bit (see Table 4-2), and protection checks are performed. All addresses are remapped from 
VA to PA, depending on the page entry, that 1s the VA 1s translated to an MVA, and the MVA 1s remapped 
to a PA. 


Table 4-2 gives the page table € bit settings for the ICache (CP15 cl Ibit=M bit= 1). 
Table 4-2 Page table C bit settings for the ICache 


C bit Description  ARM926EJ-S behavior 


O Noncacheable  ICache disabled. All instruction fetches are fetched from external memory. 





l Cacheable Cache hit Read from the ICache. 


Cache miss  Linefill from external memory. 
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Table 4-3 gives the CP15 cl C and M bit settings for DCache, and the associated 
behavior. 


Table 4-3 CP15 ci C and M bit settings for the DCache 

















Cbit Mbit ARM926EJ-S behavior 

O O DCache disabled. All data accesses are to the external memory. 

1 O DCache enabled, MMU disabled. The C bit is overridden by the M bit setting, so that the DCache is 
effectively disabled. All data accesses are noncacheable, nonbufferable, with no protection checks. All 
addresses are flat mapped, that is VA = MVA = PA. 

l l DCache enabled, MMU enabled. All data accesses are cacheable or noncacheable depending on the page 
descriptor €C bit and B bit (see Table 4-4), and protection checks are performed. All addresses are 
remapped from VA to PA, depending on the MMU page table entry, that 1s the VA 1s translated to an 
MVA, and the MVA 1s remapped to a PA. 

Table 4-4 gives the page table C and B bit settings for the DCache (CP15 cl Cbit=M 
bit = 1), and the associated behavior. 
Table 4-4 Page table C and B bit settings for the DCache 

Cbit Bbit Description 'ARM926EJ-S behavior 

O O Noncacheable, ' DCache disabled. Read from external memory. Write as a nonbuffered store(s) to 
nonbufferable external memory. DCache 1s not updated. 

O 1 Noncacheable, ' DCache disabled. Read from external memory. Write as a buffered store(s) to external 
bufferable memory. DCache 1s not updated. 

1 O Write-through | DCache enabled: 

Read hit Read from DCache 
Read miss Linefill 
Write hit Write to the DCache, and buffered store to external memory 
Write miss 'Buffered store to external memory 
l 1 Write-back DCache enabled: 

Read hit Read from DCache 
Read miss Linefill 
Write hit Write to the DCache only 
Write miss 'Buffered store to external memory. 
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Caches and Write Buffer 


The priorities that apply to the ARM926EJ-S processor for instruction accesses are 
shown in Table 4-5. The ARM926EJ-S processor gives highest priority to an address 
that 1s in the instruction TCM region. 


Table 4-5 Instruction access priorities to the TCM and cache 














Address in Address in Cacheable in ARM926EJ-S 

ITCM region DTCMregion page descriptor behavior 

Yes Yes Don't care Access ITCM 

Yes No Cacheable 

Yes No Noncacheable 

No Don't care Cacheable Access ICache 

No Don't care Noncacheable Access external memory 


The priorities that apply to the ARM926EJ-S processor for data accesses are shown mm 
Table 4-6. The Harvard arrangement for the TCM and caches requires that data reads 
and writes can access the Instruction TCM for both reads and writes. The column order 
for Table 4-6 1s deliberately the same as for instruction accesses in Table 4-5. 


Table 4-6 Data access priorities to the TCM and cache 




















Address in Address in Cacheable in ARM926EJ-S 

ITCM Region DTCM region page descriptor behavior 

Yes Yes Don't care Access DTCM 

No Yes Cacheable 

No Yes Noncacheable 

Yes No Cacheable Access ITCM 

Yes No Noncacheable 

No No Cacheable Access DCache 

No No Noncacheable Access external memory 
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4.5 Cache MVA and Set/Way formats 


This section shows how the MVA and Set/Way formats of ARM926EJ-S caches map to 
a generic virtually indexed, virtually addressed cache. 


Figure 4-1 shows a generic, virtually indexed, virtually addressed cache. 


Vitual index, virtual tag 


Tag Index 


[san 

= 2” ES 
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[E E] 
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Hit Read data 


Figure 4-1 Generic virtually indexed virtually addressed cache 


The ARM926EJ-S cache format is shown in Figure 4-2 on page 4-9. 
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Figure 4-2 ARM926EJ-S cache associativity 
Table 4-7 shows values of S and NSETS for ana ARM926EJ-S cache. 


Table 4-7 Values of S and NSETS 


Cachesize S NSETS 

















4KB 5 so 
8KB 6 64 
I6KB 7 128 
32KB 8 256 
64KB ) 912 
128KB IO 1024 


Figure 4-2 shows the ARM926EJ-S cache associativity. In Figure 4-2, the following 
points apply: 

o the group of tags of the same Index define a Set 

o the number of tags 1n a Set 1s the Associativity 

o the ARM926EJ-S caches are four-way Associative 

o the range of tags addressed by the Index define a Way 


Copyright O 2001-2008 ARM Limited. All rights reserved. 4-9 


Caches and Write Buffer 


o the number of tags in a Way 1s the number of Sets, NSETS. 


The Set/Way/Word format for ARM926EJ-S caches 1s shown 1n Figure 4-3. 


32-A 
31º 31-A S+5 S+4 o 


4 240 
Way SBZ Co Ez 
(= Index) 


Figure 4-3 ARM926EJ-S cache Set/Way/Word format 
In Figure 4-3: 
A = log, Associativity. 
For example, for a four-way cache A = 2. 


S = logo NSETS. 
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Chapter 5 
Tightly-Coupled Memory Interface 


This chapter describes the ARM926EJ-S Tightly-Coupled Memory (TCM) interface. It 
contains the following sections: 


o About the tightly-coupled memory interface on page 5-2 
o TCM interface signals on page 5-4 

o TCM interface bus cycle types and timing on page 5-8 

o TCM programmer's model on page 5-19 

o TCM interface examples on page 5-21 

o TCM access penalties on page 5-29 

o TCM write buffer on page 5-30 

o Using synchronous SRAM as TCM memory on page 5-31 
o TCM clock gating on page 5-32. 
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About the tightly-coupled memory interface 


The ARM926EJ-S processor enables low latency access to external memories using the 
Tightly Coupled Memory (TCM) interface. The term tightly coupled memory refers to 
the relationship between the ARM9EJ-S CPU core, and the operation of the memories, 
where there 1s a strong correlation between the instruction and data access activity of 
the ARM9EJ-S and the accesses made to external memory. This is in contrast to the 
accesses made to the AHB interfaces, that are relatively decoupled from the ARM9OEJ-S 
core. 


TCMSs are intended for storing certain types of critical code or data, where low latency, 
deterministic access 1s required. TCMs are not necessarily the best choice for all types 
of such code or data, 1f code or data exhibit a high degree of spatial or temporal localty 
better performance can be obtained by using cache memory. See Chapter 4 Caches and 
Write Buffer. 


The ARM926EJ-S processor supports two TCM regions, one for instructions (TTCM) 
and one for data (DTCM). The ITCM interface can also be accessed by the data side of 
the ARM9EJ-S core. This 1s necessary for code to be loaded into the TTCM, for SWI 
and emulated instruction handlers, and for accesses to PC-relative literal pools. 


The TCM address space 1s physically addressed, and the location of the TCM regions 
1n the physical address space 1s controlled by the TCM Region Register. See TCM 
Region Register c9 on page 2-28. The physical size of the TCM regions are defined by 
external inputs (IRSIZE, DRSIZE), and ranges from 4KB to IMB. The encoding for 
these pins 1s shown in TCM Size field encoding on page 2-29. The TCM regions can be 
placed anywhere 1n the physical address map, with the restriction that the TCM base 
address must be aligned with the TCM size, and that the instruction and data TCM 
regions do not overlap. The TCM region size can be interrogated by software by reading 
the TCM Status Register. See TCM Status Register c0 on page 2-11. 


The INFTRAM pin enables the ARM926EJ-S processor to boot from instruction TCM 
space after system reset. If INITRAM 1s asserted during system reset and the VINITHI 
pin 1s deasserted, then the ARM926EJ-S processor fetches the instruction at 0x00000000 
from the instruction TCM interface. If both INITRAM and VINTTHI are asserted, the 
first instruction fetch after reset 1s from 0xFFFFO000 over the AHB. 


The TCM interface supports memory accesses with zero or more wait-states. The 
requirement to support zero wait state accesses Imposes various constraints on the TCM 
sub-system design that do not apply when interfacing memories with a generic bus 
interface such as AHB. 


Because of timing restrictions, read accesses occur on the TCM interface without prior 
qualification by the MMU. This means that all reads on the TCM interface must be 
treated as being speculative, and consequently precludes the use of read-sensitive 
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memory. The TCM interface contains a two entry write buffer that avoids the 
requirement for stall cycles, because of the mismatch between the ARM9EJ-S native 
memory interface, and the requirements for standard SRAM. 


TCM accesses can be extended by using the IRWAIT/DRWATT inputs to generate wait 
states. However, the timing of these and other interface signals means that the types of 
memory sub-systems that can be implemented are limited. For example schemes that 
require an address decode to determine 1f a wait-state must be inserted are not possible 
1f operating at maximum frequency. 


DMA access can be performed either by using the IRWAIT/DRWATT signals to insert 
wait states during a DMA access, or by using the dedicated DMA interface that avoids 
the requirement to externally multiplex critical interface signals, when single cycle 
access memory 1s used. 
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TCM interface signals 


The TCM interface 1s designed to be compatible the timings of standard ASIC SRAM 
components, enabling connection to single cycle SRAM with minimal interfacing logic 
required. For standard SRAM the chip-select, address, and write data/control signals are 
setup 1n one cycle, and the read or write operation takes place 1n the next cycle. 


Data interface signals 


The signals in the DTCM interface can be grouped by function into four categories. 
o Control signals 
— — DRCS 
— — DRWAIT 
— — DRIDLE 
o Address and attribute signals 
— | DRSEQ 
— — DRADDR[17:0] 
— | DRWBLI3:0] 
— — DRnRW 
o Data signals 
— | DRRD[31:0] 
— | DRWD[31:0] 
o DMA signals 
— | DRDMAEN 
— — DRDMACS 
— | DRDMAADDR[17:0]. 


Control signals 
The control signals for the data interface are: 


DRCS 


DRCS is used to indicate that an access commences in the following cycle. For simple 
zero wait state TCM systems the DRCS signals corresponds directly to a memory chip 
select signal. For more complex systems DRCS corresponds to a memory request 
signal. 
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DRWAIT 


DRWATFT is used to extend a TCM transfer by inserting wait states. The timing of the 
DRWAIT signalis a cycle ahead of the cycle in which the data transfer takes place. This 
means that 1f an access 1s to be waited, DRWAIT must be asserted in the same cycle as 
DRCS, and deasserted one cycle before the data transfer takes place. 


DRIDLE 

The DRIDLE signal provides an early indication that no TCM access will take place in 
the current cycle. 

Address and attribute signals 


All of the address and attribute signals are valid when DRCS 1s asserted, and valid, with 
the exception of DRSEQ that also has a defined value during wait states, when DRCS 
1s not valid. 


DRSEQ 


When DRC'S is asserted and valid, DRSEQ indicates 1f the address for the current TCM 
access 1s sequential to the previous access. During wait states DRSEQ 1s forced HIGH. 


DRADDR[17:0] 

DRADDR is the word (32 bit) address for the transfer. 

DRnRW 

DRnRW indicates 1f the access 1s a read or a write. 

DRWBL[3:0] 

DRWBL is used to indicate which byte(s) of an address must be updated for write 
accesses. This 1s dependant on the address, the size of the transfer, and the current 
endianness setting. DRWBL is b0000 for reads. 

Data signals 

The data signals are: 

DRRD[31:0] 


DRRD is the read data returned by the TCM. For zero wait state systems, DRRD 1s 
valid in the cycle after DRCS. For systems with wait states, DRRD is valid in the cycle 
after DRWATT is deasserted. 
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DRWD[31:0] 


DRWD is the write data written into the TCM. Itis valid im the same cycle as DRCS. 


DMA signals 


The DMA interface enables you to generate the values of DRADDR and DRCS from 
a source external to the ARM926EJ-S processor. 


DRDMAEN 


DRDMAEN is the DMA enable signal. When asserted 1t Indicates that the DMA values 
must be used to produce DRCS and DRADDR rather than those from the internal 
ARM926EJ-S TCM controller. 


DRDMACS 


DRDMACS 1s used to generate DRCS when DRDMA EN 1s asserted. Because of the 
way the DRDMACS signal is combined with the internal ARM926EJ-S TCM 
controller, 1t is not valid to assert DRDMA EN without DRDMAC'S asserted unless the 
internal TCM controller 1s 1dle (DRIDLE asserted). The relationship between these 
signals 1s shown 1n Table 5-1. 


Table 5-1 Relationship between DMDMAEN, DRDMACS, and DRIDLE 


DRDMAEN  DRDMACS  DRIDLE DRCS 











l l O l 
l O O Unknown 
1 1 1 1 
l O l 0 


DRDMAADDR[17:0] 


DRDMAADDR is used as the source for DRADDR whenever DRDMA EN is 
asserted. 


Instruction TCM signals 


The instruction side TCM signals are almost identical to the DTCM signals. All the 
signals on the DTCM have an equivalent on the instruction side. 


o Control signals 
— — IRCS 
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— — IRWAIT 

— | IRIDLE 

Address and attribute signals 
— | IRSEQ 

— | IJRADDR[17:0] 

— | IRWBLI3:0] 

— — IRnRW 

Data signals 

— — IRRD[31:0] 

— | IRWD/[31:0] 

DMA signals 

— | IRDMAEN 

— — IRDMACS 

— | IRDMAADDR[17:0]. 


5.2.3 Differences between DTCM and ITCM 


There are differences between the DTCM and ITCM interfaces: 
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DMA to ITCM must not occur unless IRIDLE is asserted 


Only back-to-back transfers on the DTCM can be marked as sequential. On the 
ITCM, 1dle cycles can occur before requests marked as sequential. 


Sequential write transfers do not occur on the TTCM. 


The ARM926EJ-S processor does not support simultaneous access by TTCM and 
DMA. DMA access must only take place when you know that the ARM926EJ-S 
processor cannot execute code from the ITCM. The TCM interface indicates this 
by either the IRIDLE or STANDBYWFI signals. 


Copyright O 2001-2008 ARM Limited. All rights reserved. 5-7 


Tightly-Coupled Memory Interface 


3.3 


9.8.1 


5-8 


TCM interface bus cycle types and timing 


The TCM bus interface 1s pipelined to enable back-to-back accesses to TCM memory 
with zero wait states. For each TCM access there 1s one request cycle and one or more 
data cycles. Figure 5-1 shows a multi-cycle data side TCM access. 


request A | Ê | request B | 


data A -n 








CLK | | | | | | 


DRCS 
DRADDR[17:0] 





DRWBL[3:0) 


DRWD[31:0]) | 
DRSEQ 







DRWAIT 


DRRD[31:0] | | 
É Data valid 


Figure 5-1 Multi-cycle data side TCM access 


The first cycle 1s a request cycle, request A, where all of the TCM interface output 
signals are valid. The TCM subsystem responds on DRWATT, indicating that the access 
will not complete 1n the following cycle. The cycle following the request cycle, data 
A-l1,1s the first waited data cycle. In this cycle the values of DRADDR, DRnRW, 
DRWBL, and DRWD are no longer valid and their value 1s non-deterministic, and 
DRSEQ is asserted. As in the request cycle DRWATT indicates 1f the access will 
complete in the following cycle. In the penultimate data cycle, data A-n-1, DRWATIT is 
deasserted indicating that the access will complete 1n the next cycle. If the last data cycle 
of the access, data A-n, is a read then DRRD contains valid read data. Because of the 
pipelined nature of the interface, the last data cycle of one access can overlap a request 
cycle of the next access. 


Zero wait state timing 


For zero wait state accesses the timing of the TCM interface corresponds to the timing 
of a standard SRAM component, with minimal interfacing logic required. Figure 5-2 on 
page 5-9 shows examples of zero wait state accesses on the ITCM interface 
corresponding to Instruction fetches. All accesses are reads. 
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| T1 | T2 | T3 | T4 | To | To | T7 | 

Res 
IRSEQ 
IRADDR 
IRRD 





Figure 5-2 Instruction side zero wait state accesses 
In cycle T1, a nonsequential request 1s made to address A. 
In cycle T2, a sequential request is made to A+1 and data for the access to A 1s returned. 
In cycle T3, no request is made and data 1s returned for the access to A+1 
In cycle T4, a sequential request is made to A+2. 


In cycle T5, a nonsequential request 1s made to address B and data 1s returned for the 
access to A+2. 


In cycle T6, a nonsequential request 1s made to address C and data 1s returned for the 
access to B 


Note 


Cycles of a sequential request cycle do not necessarily occur 1n consecutive bus cycles, 
for the TTCM interface. Any number of idle request cycles can occur between two 
requests, with the second request marked as being sequential. The DTCM interface only 
produces sequential requests during consecutive bus cycles. 





Figure 5-3 on page 5-10 shows examples of data side zero wait state accesses. 
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T1 | T2 | T3 | T4 | To | To | T7 


DRCS 
DRSEQ 
DnRW 
DRADDR 
DRRD 


DRWD 





DRWBL 


Figure 5-3 Data side zero wait state accesses 
In cycle TI, a nonsequential read request 1s made to address A. 


In cycle T2, a nonsequential word write request 1s made to address B and data 1s 
returned for the access to A. 


In cycle T3, no request is made. 
In cycle T4, a nonsequential read request 1s made to address C. 


In cycle T5, a sequential read request is made to address C+1 and data 1s returned for 
the access to C. 


In cycle T6, a nonsequential byte write request is made to address D. 


5.3.2 DMA access to zero wait state TCM 


For DMA accesses to zero wait state memories, the TCM DMA interface can be used. 
This enables an alternative source of address and chip-select to be passed through to the 
TCM memories without impacting timing. Figure 5-4 on page 5-11 shows the 
relationship between DRDMAEN, DRDMACS, DRDMAADDR, DRADDR and 
DRCS. 
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DRDMAADDR 
Earl ssh 
Late address DRADDR 
DRDMAEN 
DRDMACS 
DRCS 


Figure 5-4 Relationship between DRDMAEN, DRDMACS, DRDMAADDR, DRADDR and DRCS 
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Internal to the ARM926EJ-S processor there are multiple sources for both the address 
and chip-select outputs. The address and chip-select outputs of the TCM interface are 
timing critical, however not all of the internal sources are timing critical. By combining 
the DMA inputs with non-critical address and chip-select signals, DMA can be done 
without impacting timing on these outputs. All other TCM interface outputs are non 
timing critical, and can be multiplexed externally. 


The logic used to combine the DMA chip-select with the internal chip-select signals 1s 
designed so that 1f the DMA inputs are selected then the DMA chip-select 1s also 
asserted. If this 1s not the case then the chip-select output value 1s non-deterministic 
unless 1t 1s known that the TCM interface is an 1dle state, as indicated by the DRIDLE 
or STANDBYWFI signals. 


Figure 5-5 on page 5-12 shows an example of how DMA accesses Interact with normal 
DTCM accesses. 
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T1 | T2 | T3 | T4 | To | To 


CLK | | | | | | | | | | | | | 


DRDMAEN 


DRCS | 


DRDMACS | 


DRADDR | 





DRDMAADDR 


DRSEQ | 








DRIDLE 


Figure 5-5 DMA access interaction with normal DTCM accesses 


In cycle T1, the ARM926EJ-S internal TCM controller 1s 1dle and DRIDLE 1s asserted. 
DRDMAEN is asserted, and consequently the value of DRDMA ADDR is propagated 
onto DRADDR, and DRCS is asserted (DRDMACS = 1). DRSEQ is forced LOW. 


In cycle T2, the ARM926EJ-S internal TCM controller 1s no longer 1dle, and DRIDLE 
1s deasserted. A nonsequential request is made to address B. 


In cycle T3, a sequential request is made to address B+1 and DRSEQ 1s asserted 


In cycle T4, the ARM926EJ-S internal TCM controller attempts to output values 
corresponding to a sequential request to address B+2. DRDMAEN 1s asserted, and the 
value of DRADDR and DRSEQ change accordingly. The ARM926EJ-S TCM 
controller 1s stalled. 


In cycle T5, DRDMA EN is deasserted and the ARM926EJ-S TCM controller re-issues 
the request to address B+2. Because of the intervening DMA access, DRSEQ is 
deasserted for the repeated request. 


In cycle T6, a sequential request 1s made to address B+3 and DRSEQ 1s re-asserted. 


DMA accesses can be made to the TTCM using the IRDMAEN, IRDMACS, and 
IRDMAADDR signals but, unlike the DTCM, simultaneous access by the 
ARM926EJ-S and DMA 1s not supported. This means that TTCM DMA must not take 
place while executing code from the ITCM. 


Copyright O 2001-2008 ARM Limited. All rights reserved. ARM DDI 0198E 


Tighitly-Coupled Memory Interface 


5.3.3 Multi-cycle access timing 
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If non zero wait state memory 1s used for TCM, then the DRWATT/RWATT signals 
are used to wait the ARM926EJ-S. The wait information for a data cycle 1s pipelined so 
that the value of DRWATIT/IRWATT pertains to the following data cycle. This 
corresponds to the request cycle for the first data cycle. If there 1s no active TCM access 
then the value on DRWATT/IRWATIT is ignored. This enables you to generate the wait 
signals speculatively. 


Figure 5-6 shows how the speculative generation of IRWATT can be used to generate a 
single wait state for every ITCM access. 


IRCS 
IRWAIT 


IRADDR | 





IRRD | 


Figure 5-6 Generating a single wait state for ITCM accesses using IRWAIT 
In cycle Tl, IRWAIT is asserted but no request 1s made. 
In cycle T2, IRWAIT is asserted and a request 1s made. 


In cycle T3, IRWATIT is deasserted indicating that the access to A will complete 1n the 
following cycle. 


In cycle T4, IRWATIT is asserted and a request 15 made. The access to A completes. 


In cycle T5, IRWAIT is deasserted indicating that the access to B will complete 1n the 
following cycle. 


In cycle T6, IRWATIT is asserted. No request 1s made. The access to B completes. 


The logic required for the example shown in Figure 5-6 corresponds to the two-state 
state machine shown 1n Figure 5-7 on page 5-14. 
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IRCS = 0 


Figure 5-7 State machine for generating a single wait state 


IRCS = 1 


In the WATT state IRWATIT is asserted. In the COMPLETE state IRWAIT is 
deasserted. 


Certain types of memories can have different access penalties depending on whether an 
access 1s sequential or nonsequential. The IRSEQ/DRSEQ signals indicate 1f an access 
1s sequential in the request cycle for an access, and are held HIGH during waited cycles. 
This behavior enables a loopback arrangement, where the SEQ output can be fed 
directly back into the WATT input through an inverter to produce a single cycle wait 
state for nonsequential accesses as shown in Figure 5-8. 








IRADDR[17:0] 
IRRD[31:0] 





Figure 5-8 Loopback of SEQ to produce a single cycle wait state 


The cycle timing of the circuit shown 1n Figure 5-8 1s shown in Figure 5-9 on page 5-15. 
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IRCS 





IRRD | 


Figure 5-9 Cycle timing of loopback circuit 
In cycle Tl, a nonsequential request is made to address A and IRWATIT is asserted. 


In cycle T2, IRSEQ is asserted because of the wait-state. IRWATIT is deasserted. IRCS 
1s unknown. 


In cycle T3, the access to A completes and a sequential request 1s made to A+1. IRSEQ 
1s HIGH and IRWAIT is LOW 


In cycle T4, the access to A+] completes. No new request 1s issued. The values of 
IRSEQ and IRWAIT are unknown. 


In cycle T5, a nonsequential request 1s made to address B and IRWATT is asserted 


In cycle T6, IRSEQ is asserted because of the wait-state. IRWATIT is deasserted, IRCS 
1s unknown. 


In cycle TY, the access to B completes. 


For systems that also require DMA access to non zero wait state memories, the WAIT 
signal is used to stall the ARM926EJ-S processor for both wait states and DMA 
arbitration. The information required to perform an access 1s only valid during the 
request cycle for that access. If a TCM access 1s postponed because of DMA, this 
information must be captured at the end of the request cycle. 


Figure 5-10 on page 5-16 shows an example of a system where DMA access 1s required 
to a memory that has a single wait state for nonsequential accesses. 
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FORCE NSEQ 
DRWAIT DMAWAIT 


DRSEQ 


DRCS 


DRADDR[17:0] 
DRWBL[3:0] 
DRnRW 


DRWD[31:0] 





DRRD[31:0] 


Figure 5-10 DMA with single wait state for nonsequential accesses 


The logic used to generate DRWATIT uses both the loopback scheme using DRSEQ for 
Inserting a wait state for a nonsequential request, and an additional signal DMAWATT, 
for stalling during DMA accesses. The FORCE NSEQ signal is an override signal 
used to force the ARM926EJ-S access to be treated as nonsequential because of an 
intervening DMA access. 


The A, WE, nRW and WD inputs to the TCM are either sourced directly from the 
ARMO926EJ-S TCM interface, from the DMA controller, or from the capture register, 
clocked by REQCLK, 1f the ARM926EJ-S access 1s postponed because of DMA 
activity. 


The cycle timing of the circuit shown in Figure 5-10 1s shown in Figure 5-11 on 
page 5-17. 
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DRWAIT 
DMAWAIT 
FORCE NSEQ 
REQCLK 


cs 
sEQ 


DRRD | É 
D(A+1) D(A+2) 7 | -— D(D) 


Figure 5-11 Cycle timing of circuit with DMA and single wait state for nonsequential accesses 


In cycle Tl, the ARM926EJ-S imitiates a sequential request to address A and the DMA 
gains ownership of the TCM. DRWATIT is asserted because of DMAWAIT. The CS,A, 
WE signals for the TCM are sourced from the DMA. The values of DRADDR, 
DRBWL and DnRW are registered. 


In cycle T2, the DMA access 1s still active (two cycle nonsequential access). DRWATT 
1s held HIGH because of DMAWAIT. 


In cycle T3, the DMA access completes and DMAWATT is deasserted. The access 
attributes captured at the end of T1 are used to generate the CS, A and WE signals for 
the TCM. DRWATIT is asserted because of FORCE NSEQ. 


In cycle T4, FORCE NSEQ 15 deasserted causing DRWATT to be deasserted 
indicating that the access will complete in the next cycle. 
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In cycle T5, the access to A completes. A sequential request 1s made to A+1. There is 
no DMA activity. 


In cycle T6, the access to A+1 completes. A sequential request is made to A+2. There 
1s no DMA activity 


In cycle TY, the access to A+2 completes. No request is made and DRCS 1s deasserted. 
A DMA access to address C starts and DRWATFT is asserted using DMAWATT. 


In cycle T8, DRWATIT remains HIGH because of DMA access. No request 1s made, and 
DRCS remains LOW. 


In cycle T9, the DMA access to € completes. A nonsequential request 1s made to 
address D. 
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TCM programmer's model 


After reset, the behavior of the TCMs 1s controlled by the state of the TCM Region 
Register, CP15 c9. 


Enabling the ITCM 


The ITCM can automatically be enabled at reset using the INFTFRAM pin. If 
INIFTRAM is held HIGH during system reset, and the VINFTHI pin 1s deasserted, the 
ITCM 1s enabled with the TTCM region base set to GER enables you to run boot 
code from the ITCM. Boot code must be pre-loaded into the TCM for this to be useful. 


H INITRAM is LOW during system reset and the TTCM 1s disabled, the TTCM can be 
enabled by writing to the TTCM Region Register. See TCM Region Register c9 on 
page 2-28. 





Note 


IH INITRAM = 1 and VINITHI = 1, the TTCM 1s enabled at system reset but the 
ARMO926EJ-S processor boots from 0xFFFFOODO. 


Enabling the DTCM 
Unlike the TTCM there 1s no way of automatically enabling the DTCM at reset. The 
DTCM can only be enabled by writing to the DTCM Region Register. See TCM Region 
Register c9 on page 2-28. 

Disabling the ITCM 
Disable the TTCM by clearing bit O of the TTCM Region Register, CP15 c9. This register 
must be written using a read-modify-write operation. 

Disabling the DTCM 
Disable the DTCM by clearing bit O of the DTCM Region Register, CP15 c9. This 
register must be written using a read-modify-write operation. 

Cacheable and bufferable attributes 


All MMU page table entries used to map TCM address space must be marked 
noncacheable. This 1s required for forward compatibility. 
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Note 


H HRESEThn is asserted asynchronously during a warm reset, you must ensure 
that all writes to the TCM have been completed and the TCM interfaces are 1dling 
at the time of HRESE Tn assertion. This ensures that there are no pending writes 
1n the write buffer that might be lost in the event of a warm reset, and prevents loss 
or corruption of TCM contents by an existing write on the TCM interface. 


To achieve this, you must: 

1. execute a drain write buffer instruction 
2. enter Standby WFI mode 

3. assert HRESE Tn. 


Contact ARM Limited for more details. 
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5.5 TCM interface examples 


This section contains the following examples: 

o Zero-wait-state RAM example 

o Producing byte writable memory using word writable RAM 
o Multiple banks of RAM example on page 5-22. 


Note 


Most of the examples in this section are for the DTCM interface. These are also 
applicable to the TTCM interface. 





The additional logic required for implementing the examples mn this section 1s the 
responsibility of the implementer. 


5.5.1 Zero-wait-state RAM example 


Figure 5-12 shows the simplest RAM interface where the RAM block 1s constructed 
from a single word-wide RAM that has byte write control. The TCM interface can 
connect directly to the RAM block. This 1s a zero-wait-state memory so DRWATT is 
tied LOW. 


ARM926EJ-S 


RAM 32KB 


DOUT[31:0] 





Figure 5-12 Zero wait state RAM example 


5.5.2 Producing byte writable memory using word writable RAM 
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If byte-write RAM 1s not available, four banks of byte-wide RAM must be used as 
shown in Figure 5-13 on page 5-22. 


The rules for connecting four RAM blocks are: 
o Each byte-wide RAM has the same address and chip-select control as the 
word-wide RAM. 
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o The following connections must be made: 


ARM926EJ-S 







2.5.3 


5-22 


DRWD[31:0] 
oo DRWRI7:0] DRWR[15:8] DRWR[23:16] 


DRADDR[17:0] 


Do 1 fosgas 
DRWBLI[3:0] 


DRSIZE[3:0] 


> CLK 


DRWBL[0], DRWD[7:0], and DRRD[7:0], connect to RAM byte O 
DRWBL[1], DRWD[15:8], and DRRD[15:8], connect to RAM byte 1 
DRWBL[2], DRWD[23:16], and DRRD[23:16], connect to RAM byte 2 
DRWBL[3], DRWD[31:24], and DRRD[31:24], connect to RAM byte 3. 





DRWR[31:24] 





> CLK 


32K RAM 32K RAM 


DRWAIT 


DRnRW 


DRCS 
DRRD[31:0] 





Byte 1 


Cs DOUT[7:0] 





DRRD[31:24] 


Figure 5-13 Byte-banks of RAM example 


Note 


In Iittle-endian mode, DRWBL([0] indicates the LSB of the word and DRWBL[3] 
indicates the MSB. In big-endian mode, DRWBL[3] indicates the LSB of the word and 
DRWBL[0] indicates the MSB. 


Multiple banks of RAM example 


Ifyou have to create a large memory out of smaller RAM blocks, there are two methods 
for doing this: 


o If minimizing power consumption 1s more important than a fast design, you must 
follow the example in Optimizing for power on page 5-23. 


o If a fast design 1s more important than minimizing power consumption, you must 
follow the example in Optimizing for speed on page 5-24. 


The rules for producing memory out of smaller RAM blocks are: 


o There must be an even number of RAM blocks b, b = 2,4, 8, for example. 
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e Each RAM block must be the same size. 


o If the address width of the required memory size 1s n bits, the address port of the 
smaller RAM blocks 1s m = n-(logp/log>) bits wide. 


o Address bits [m-1:0] are applied to all the RAM blocks. 


o Address bits [n-1:m] are gated with DRCS for a power optimized solution, or 
with IRnRW for a speed optimized solution. 


o Pipelined address bits [n-1:m] are used to select the correct RAM read data. 


Optimizing for power 


Figure 5-14 shows how to produce a large memory from two smaller RAM blocks 1f 
you are optimizing for power. Separate chip select control 1s required for each RAM 
block: 


CS bank0 = -DRADDRI[14] & DRCS 
CS bankl = DRADDR[14] & DRCS 


This ensures that only the RAM being accessed 1s enabled, minimizing power 
consumption. 


ARM926EJ-S 






DRWD[31:0] 
DRADDR[17:0] 
DRWBL[3:0] 


EA 
Tt  DRAPDRIEO TT DRADDRI13:0) 


DIN[31:0] BW[3:0] DIN[31:0]  BW[3:0] 
A[13:0] A[13:0] 
GRSA = ER] RAM 64KB RAM 64KB 
DRIDLE 
DRSEQ 
DRWAIT DRADDR[14] Bank 1 


CLK CLK 


WE 


DRnRW WE 
 |WPE DOUT[31:0] CS DOUT[31:0] 





DRRD[31:0] 


Figure 5-14 Optimizing for power 
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Optimizing for speed 


Figure 5-15 shows how to produce a large memory from two smaller RAM blocks 1f 
you are optimizing for speed. Separate write enable control 1s required for each RAM 
block: 


WE bank0 = -DRADDR[14] & DRnRW 
WE bankl = DRADDR[14] & DRnRW 


No logic 1s added to the critical DRCS path, but both RAMs are enabled whenever 
DRCS is asserted, resulting in higher power consumption. 


ARM926EJ-S 






DRWD[31:0] DRWD[31:0] 
DRADDR[17:0] 


ls. E 
DRWBL[3:0] E Ta | DRADDRI13:0] 


DRWBLI3:0] 
DRnRW 


DIN[31:0] BW[3:0] DIN[31:0]  BW[3:0] 
DRADDR[14] A[13:0] A[13:0] 


RAM 64KB RAM 64KB 


DRSIZE[3:0] 


E 
DRSEQ 


DRWAIT Bank 1 


CS DOUT[31:0] CS DOUT[31:0] 


DRRD[31:0] 


Figure 5-15 Optimizing for speed 


Sequential ROM example 


The diagram in Figure 5-16 on page 5-25 shows an example of a TCM sub-system that 
uses wait states for nonsequential accesses. The ROM used to hold instructions can 
cycle at the same frequency as the ARM926EJ-S processor 1t 1s interfaced to. However, 
the memory access time for the ROM, the time from chip-select/address to data out, 1s 
not fast enough to be directly interfaced to the ARM926EJ-S processor. 
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ARM926EJ-S 


IRRD[31:0] 





Figure 5-16 TCM subsystem that uses wait states for nonsequential accesses 


The address and chip-select inputs to the ROM are pipelined with respect to the 
ARMO926EJ-S TCM interface outputs. An address incrementer 1s used to generate 
sequential addresses. The output of the incrementer 1s captured at the end of every cycle 
where the ROM CS chip select 1s active. The address source for the ROM 1s switched 
over to the registered version of IRADDR when a nonsequential access occurs. 


Figure 5-1'7 on page 5-26 shows the timing of the ROM address, chip-select, and read 
data relative to the ARM926EJ-S TCM interface signals. The address supplied to the 
ROM can either be behind, im sync with, or ahead of IRADDR, depending on the type 
of memory access and the presence of idle cycles. 
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IRSEQ | 


IRWAIT 





IRADDR | 


cs 


RD 





IRRD | | 
I(A+3) 


Figure 5-17 Cycle timing of circuit that uses wait states for non sequential accesses 


5.5.5 DMA interface example 


Figure 5-18 on page 5-27 shows an example TCM subsystem using the DMA interface. 
The signal driving DRDMA EN is connected to both the DRDMAEN and DRDMACS 
inputs. Itis also used to control the multiplexing of the non timing critical signals 
(WBL, nRW, and WD), although this 1s not shown for clarity. 


5-26 Copyright O 2001-2008 ARM Limited. All rights reserved. ARM DDI 0198E 


Tightly-Coupled Memory Interface 





ARM926EJ-S DMA 


DRDMAADDR[17:0] DMAADDR][31:0] 
DRDMAEN DRDMAEN 
DRDMACS 



















DMAWD|[31:0] 
DMAnRW 

DMAWBLI3:0] 
DMARD[31:0] 


DRRD[31:0] RD[31:0] 


WBL[3:0 
DRWBL[3:0] [3-0] 


DRnRW 


WD[31: 
DRWD[31:0] [31:0] 


DRADDR[17:0] A[17:0] 
DRCS CS 


DRWAIT SRAM 


DRSEQ 





Figure 5-18 TCM subsystem that uses the DMA interface 


5.5.6 Integrating RAM test logic 
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The memory used to implement TCM might require some form of test access, typically 
by a BIST controller. Generally this 1s done by adding a collar of multiplexors around 
the memory inputs. However, this method adds undesirable delays to the chip select and 


address signals. This can be avoided by using the DMA interface to perform the 
multiplexing of address and chip-select values. This 1s shown in Figure 5-19 on 


page 5-28. 
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HRESETn BISTRSTn 










ARM926EJ-S BIST 


DRDMAADDR[17:0] BISTADDR[17:0] 
DRDMAEN BISTEN 
DRDMACS BISTCS 


BISTWD[31:0] 
BISTnRW 

BISTWBL[3:0] 
BISTRD[31:0] 


RD[31:0] 
WBLIS: 

DRWBL[3:0] 8] 

DRnRW 

WDI31: 

DRWD[31:0] E 
DRADDR[17:0] A[17:0] 

DRCS cs 
DRWAIT SRAM 


DRSEQ 


Figure 5-19 TCM test access using BIST 


This 1s similar to the previous DMA example. However, for BIST testing 1t 1s necessary 
for the BIST controller to be able to force the memory chip select to both HIGH and 
LOW values. This requirement means that 1t 1s necessary to hold the ARM926EJ-S core 
1n such a state that the internal value of the chip select 1s guaranteed to be LOW. This 
can be done by holding the ARM926EJ-S 1n reset, HRESE Tn LOW, during TCM 
memory BIST testing. 


Note 


HRESETn cannot also be used as a reset control to the BIST controller. 





5-28 Copyright O 2001-2008 ARM Limited. All rights reserved. ARM DDI 0198E 


Tightly-Coupled Memory Interface 


5.6 TCM access penalties 


The data side of the ARM926EJ-S core can access the TTCM. To maximize the 
performance of the ITCM, data read accesses to the TTCM are pipelined. The 
ARM926EJ-S core is stalled for two cycles to enable the pipeline read to complete. This 
1s the only ARM926EJ-S TCM interface stall scenario. The inclusion of a write buffer 
1n the TCM controller has eliminated all other sources of potential stalling for zero wait 
state TCM. 
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TCM write buffer 


Each TCM interface has a two word entry write buffer. This 1s required to de-pipeline 
the address and data values produced by the ARM9EJ-S core so that non-speculative 
writes can be made to memory with SRAM characteristics performed without 
introducing stall cycles. 


The ARM9EJ-S core read requests take priority over writes, and consequently TCM 
transactions can be out of order with respect to instruction execution. If a read access 
occurs to a location that also has a corresponding entry 1n the write-buffer, then data 1s 
forwarded from the write-buffer. If it 1s necessary to ensure that all outstanding writes 
have completed on the TCM interface then the CP15 draim write buffer instruction can 
be used (MCR p15, 0, Rd, c7, clQ, 4). This instruction does not complete execution 
until all outstanding buffered writes, TCM and AHB, have been completed. 


To guarantee that the TCM write buffers have been drained and that all outstanding 
requests on the TCM interface have completed, a dramm write buffer instruction must be 
used prior to disabling either of the TCM regions. 
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5.8 Using synchronous SRAM as TCM memory 
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If you use SRAM to implement TCM memory, then your library RAM must meet the 
following requirements: 


o It must be synchronous. All timings must be relative to the rising clock edge. 
o It must have a chip select, RAM enable. 

o The RAM outputs must always be valid. They must not be tristated. 

o Byte write control 1s required. 


o RAM setup times must be less than 10-15% and access times must be less than 
40-50% of the target cycle time. Violation of these requirements results im a 
slower design. Setup and access times can be balanced by skewing the clock to 
the RAM. 


Ideally each TCM can be constructed from single RAM blocks. However, this 1s not 
always possible for the following reasons: 


o If your RAM does not have byte write control, you must construct the word-wide 
RAM out of four byte-wide RAMs. See Producing byte writable memory using 
word writable RAM on page 5-21. 


o If your compiler cannot produce a single RAM block that 1s the required size, or 
if a single RAM block does not meet the timing requirements. In these cases, you 
must produce the RAM out of two or more blocks of smaller RAM. See Multiple 
banks of RAM example on page 5-22. 


Ideally, your RAM block can connect directly to the ARM926EJ-S TCM interface. 
However, this 1s not always possible, and additional logic 1s required 1n the following 
cases: 


o AN TCM signals are driven as active HIGH. If your RAM requires active LOW 
signals, you must add inverters to create the active LOW signals. 


o If power control logic 1s required. 


o Ifa RAM 1s non single-cycle, or hardware DMA arbitration 1s required, logic 1s 
required to drive the appropriate wait signal. 


Note 


DRADDR is always a word address. DRWBL is used as a byte lane strobe to select the 
appropriate byte of the addressed word on writes. Reads are always word-wide. 
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5.9 TCM clock gating 


If the ARM926EJ-S processor 1s not currently running code from a TCM region, the 
idle signal for that TCM (DRIDLE for DTCM, IRIDLE for ITCM) 1s asserted. This 
indicates that a TCM access 1s not performed in that cycle, enabling you to stop the 
TCM clock. If no clock stopping 1s required, you can ignore the idle signals. 


You can also use the 1dle signal to disable power to the RAMs 1f you require more 
stringent power control. Removing the RAM power invalidates the RAM contents so 
you must only do this 1f the TCMs are not being used and do not contain valid data. 
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This chapter describes the ARM926EJ-S Bus Interface Unit (BIU). It contains the 
following sections: 


o About the bus interface unit on page 6-2 
o Supported AHB transfers on page 6-3. 
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6.1 About the bus interface unit 


The ARM926EJ-S Bus Interface Unit (BIU) arbitrates and schedules AHB requests. 
The BIU contains separate masters for both instruction and data access enabling 
complete AHB system flexibility. Separate masters enable multi-layer AHB (see the 
Multi-layer AHB Overview) and multi- AHB systems to be implemented, giving the 
benefit of increased overall bus bandwidth and a more flexible system architecture. 
Each master 1s a fully compliant AHB bus master and implements the master functions 
as defined 1n the AMBA Specification (Rev 2.0). 


To increase system performance, write buffers are used to prevent AHB writes stalling 
the ARM926EJ-S system. For more details, see Chapter 4 Caches and Write Buffer. 


The data BIU AHB signals are prefixed with D, and the instruction BIU signals are 
prefixed with 1. 
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6.2 Supported AHB transfers 


The ARM926EJ-S processor supports a subset of AHB transfers. The permitted AHB 
transfers are described 1n: 


o Memory map 

o Transfer size 

o Mapping of level one and level two AHB attributes on page 6-5 
o Byte and halfword accesses on page 6-5 

o AHB system considerations on page 6-6 

o AHB clocking on page 6-10. 


6.2.1 Memory map 


The ARM926EJ-S processor 1s a cached processor with two AHB interfaces. Itis a key 
system design issue that the D side must be able to access the same memory as the I 
side, with the same memory map. This 1s required not only to load code, but to enable 
access to PC-relative literal pools, and for SWI and emulated instruction handlers to 
work. 


Note 


This 1s unlike some Harvard arrangements whereby the I-bus can be connected to the 
ROM and the D-bus only connected to RAM/peripherals. 





6.2.2 Transfer size 


The ARM926EJ-S processor performs all AHB accesses as single byte, single 
halfword, single word, bursts of four words, or bursts of eight words. Any ARM9EJ-S 
core requests that are not 1, 4, or 8 words 1n size are split into packets of these sizes. For 
example, an STM of 12 words 1s performed on the AHB as a burst of 8 followed by a 
burst of 4. If a burst is interrupted because of either a Split or Retry response, or by 
removal of HGRANT, then the burst is completed as single transfers. Consequently the 
ARM926EJ-S processor only uses a subset of the possible HBURST and HSIZE 
encodings. 
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Table 6-1 shows the HBURST encodings that the ARM926EJ-S processor uses, and the 
operations that perform each burst size. 


Table 6-1 Supported HBURST encodings 


HBURSTI[2:0] Description Operation 


Single Single transfer Single transfer of word, halfword, or byte: 
o data write (NCNB, NCB, WT, or WB that has missed in DCache) 
o data read (NCNB or NCB) 











o NC instruction fetch (prefetched and non-prefetched) 
o page table walk read 
o continuation of a burst that either lost grant or received a 
Split/Retry response. 
Incr4 Four-word incrementing Half-line cache write-back. Instruction prefetch, 1f enabled. Four-word 
burst burst NCNB, NCB, WT, or WB write. 
Incr8 Eight-word incrementing Full line cache write-back. Eight-word burst NCNB, NCB, WT, or WB 
burst write. 
Wrap8 Eight-word wrapping burst | Cache linefill. 





Note 

Incr4 and Incr8 bursts can be aligned to any word boundary. The ARM926EJ-S 
processor performs all Thumb instruction fetches as word-wide transfers on the AHB. 
See Mapping of level one and level two AHB attributes on page 6-5. 


All burst reads and writes are performed by the ARM926EJ-S processor as word-wide 
transfers (HSIZE[2:0] = 010). Single reads and writes are performed as byte 
(HSIZE[2:0] = 000), halfword (HSIZE[2:0] = 001), or word wide transfers 
(HSIZE[2:0] = 010). 


Note 


The ARM926EJ-S processor does not generate BUSY transfers on either DHTRANS 
or IHTRANS 
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6.2.3 Mapping of level one and level two AHB attributes 


Table 6-2 shows the IHPROT[3:0] and DHPROT][3:0] mappings for memory 
operations. 


Table 6-2 IHPROT[3:0] and DHPROT[3:0] attributes 


























Operation HPROT[3:0]a Description 
DCache linefill 11P1b CB, data access 
ICache linefill 11POb CB, opcode fetch 
Page table walk (data) mal Page table walk caused by a TLB miss for a data access 
Page table walk (instruction) 1110 Page table walk caused by a TLB miss for an instruction fetch 
Instruction fetch OOPOb NCNB opcode fetch 
Data access LDR/STR O00P1b NCNB 
01P1b NCB 

STR 11P1b WT/WB 

DCache write-back WH1l - 


a. This is either HPROT[3:0] or DHPROT]3:0] 
b. Pis 1 1f the access 1s caused by a privileged access by the core, and O 1f it 1s caused by a user access. 


Table walk reads that occur because of TLB misses for both data and instructions are 
performed using the data side bus master. The state of DHPROTT0] can be used to 
identify 1f a table walk 1s caused by an instruction fetch miss 1n the TLB: 


DHPROT[0] = 0 | Indicates that an instruction fetch miss caused the page table walk. 
DHPROT[0]=1 | Indicates that a data access miss caused the page table walk. 


Attributes specified for LDR instructions also apply for LDM, LDRD, and LDC 
operations. Similarly those for STR apply for STM, STRD, and STC operations. 


A DCache write-back can be caused either by an eviction during a linefill, or an explicit 
cache clean operation. 


6.2.4 Byte and halfword accesses 
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This section describes byte and halfword accesses for: 
o Address alignment on page 6-6 
o Thumb instruction fetches on page 6-6 
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o Endianness and byte lane indication. 


Address alignment 


The ARM926EJ-S BIU performs address alignment checking and aligns AHB 
addresses to the necessary boundary. 16-bit accesses are aligned to halfword 
boundaries, and 32-bit accesses to word boundaries. 


Thumb instruction fetches 


All instruction fetches, irrespective of the state of the ARM9EJ-S core, are made as 
32-bit accesses on the AHB. If the ARM9EJ-S core is in Thumb state, then two 
instructions can be fetched at a time. 


Endianness and byte lane indication 


The AMBA Specification (Rev 2.0) does not specify any explicit support for endianness. 
The ARM926EJ-S processor provides a supplementary signal, DHBL, that indicates 
which bytes are to be updated for write transfers and which bytes must contain valid 
data for reads. This 1s created using the address, and the endianness of the access. 


The CFGBIGEND signal indicates the current endianness setting used by the 
ARMDO9EJ-S core, and reflects the value held in CP15 cl. See Control Register cl on 
page 2-12. 


Because writes are buffered, the value of the CFGBIGEND signal might be 
inconsistent with DHBL 1f the write-buffer 1s not drained before changing the 
endianness setting 1n the control register. 


DHBL is encoded 1n little-endian format. For example, a value of b0001 indicates byte 
O 1n Iittle-endian mode, and byte 3 im big-endian mode. 


6.2.5 AHB system considerations 
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This section describes AHB considerations for: 
o Single-layer AHB systems 

o Multi-layer AHB systems on page 6-7 

o Multi-AHB systems on page 6-8 

o Memory coherency on page 6-9. 


Single-layer AHB systems 


If the ARM926EJ-S processor 1s to be used 1n a single-layer AHB system, each of the 
two BIU masters must be treated as being unique. 
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The simplest way of Integrating the two ARM926EJ-S bus masters into a single-layer 
AHB system is for each master to be a separate requestor into the AHB arbiter, the same 
as for any multi-master system. The data master normally has higher arbitration priority 
than the instruction master. 





Note 


The ARM926EJ-S instruction AHB interface does not perform locked transfers so 
IHLOCK is always driven LOW. 


DHCLKEN and IHCLKEN must be tied together, as described mn AHB clocking on 
page 6-10. IfHCLK and CLK are the same frequency, DHCLKEN and IHCLKEN 
must be tied HIGH. 


Because of the handover cycle when transferring ownership of the bus, a nongranted bus 
master incurs an extra cycle of latency to get onto the bus 1f the bus 1s currently 1dle. 
This means that 1f the data BIU is the default bus master, it can start AHB transactions 
a cycle earlier than the instruction BIU, nondefault bus master, that must wait for 
ownership of the bus to be handed over. 


This cycle of latency only exists for the first transaction. If the instruction BIU is 
prefetching instructions, for example, 1t can perform back-to-back requests and 
maintain ownership of the bus until a higher priority bus master 1s granted. 


Multi-layer AHB systems 


Figure 6-1 on page 6-8 shows an example of a Multi-layer AHB system. In this example 
the I-interface labeled I-side, and the D-interface labeled D-side are configured through 
an Interconnect matrix to have access to four slave ports. If the two AHB interfaces, 
known as layers, require access to the same slave at the same time, then an arbitration 
process within the interconnect matrix determines the layer that has the highest priority. 
Under this system D-side can have access to any slave port that I-side 1s not using at that 
time. This increases the overall bus bandwidth available. 
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Interconnect 
matrix 


Slave 
* 1 


DMA 


master 
Slave 


H 2 


I-side 


master Slave 


H 3 


ARM926EJ-S 


processor 
Slave 
D-side H 4 


master 





Figure 6-1 Multi-layer AHB system example 


Multi-layer AHB 15 described in more detail mn the Multi-layer AHB Overview. 


Multi-AHB systems 


Itis possible that the ARM926EJ-S instruction and data AHB interfaces can be 
connected to separate AHB systems, although there must be a mechanism to support 
data side access to the instruction memory. Each AHB system can be running at 
different frequencies. The ARM926EJ-S processor 1s able to cope with this by 
providing two HCLKEN inputs: 


o DHCLKEN is used to specify the rising HCLK edge for the system in which the 
data BIU is the master 


o IHCLKEN is used to specify the rising HCLK edge for the system in which the 
instruction BIU 1s the master. 


Figure 6-2 on page 6-9 shows an example of a Multi-AHB system. 
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DHCLKEN D-AHB 
D-AHB subsystem 


ARM926EJ-S 
processor 










D-AHB to I-AHB bridge 


IHCLKEN I-AHB 
I-AHB a subsystem 


Figure 6-2 Multi-AHB system example 


IH both AHB systems operate at the same frequency, DHCLKEN and IHCLKEN must 
be tied together. See AHB clocking on page 6-10 for more details. 


The AHB clock for each system, HCLK1 and HCLK2, must be synchronized to the 
ARMO926EJ-S clock signal CLK. 


Memory coherency 


Because of the Harvard nature of the ARM926EJ-S processor, instruction and data flow 
order cannot be guaranteed, and the arbitration order of the two masters can be 
considered to be arbitrary. 


For single and multi-layer AHB systems: 


o the arbitration priority of the two masters determines which of the masters 1s 
granted the bus, 1f both make a simultaneous request 


o 1f the granted master receives a Split or Retry response, the other master can be 
granted the bus and complete 1ts transaction before the split master completes. 


For multi-AHB systems: 
o the two systems can be operating at different frequencies 
o the memory slaves can insert waits and/or issue Split or Retry responses. 


If the sequence of flow 1s critical, in self-modifying code for example, an Instruction 
Memory Barrier (IMB) must be used to force coherency. See Chapter 9 Instruction 
Memory Barrier for more details. 
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6.2.6 AHB clocking 


D/IHCLKEN 


from ARM926EJ-S 


to ARM926EJ-S 


The ARM926EJ-S design uses a single clock, CLK. To run the ARM926EJ-S processor 
at a higher frequency than the AHB system bus, a separate AHB clock enable for each 
of the two bus masters 1s required. In a multi-AHB system each AHB system can be 
running at a different frequency: 


DHCLKEN Is used to signify the rising edge of HCLK for the system data 
BIU bus master. 


IHCLKEN Is used to signify the rising edge of HCLK for the system 
instruction BIU bus master. 


Figure 6-3 shows the relationships between CLK, HCLK, DHCLKEN, and 
IACLKEN. 


Skew between CLK and HCLK 


4 
| 


Figure 6-3 AHB clock relationships 


For single and multi-layer AHB systems, DHCLKEN and IHCLKEN must be tied 
together. If HCLK and CLK are the same frequency, the relevant HCLKEN input, or 
inputs, must be tied HIGH. 


CLK and HCLK must be synchronous. The skew between CLK and HCLK must be 
minimized. 


6.2.7 External Abort limitations 
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Only certain types of accesses cause an External Abort 1f an Error response is returned 
for an AHB transfer. These are: 


o page table walk 

o noncached read 

o nonbuffered write 

o noncached read-lock-write (SWP). 


For all other types of access (cache linefills, writeback evictions, buffered writes), an 
Error response 1s ignored. 
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If the ARM926EJ-S processor 1s to be used in a system that has to be tolerant to soft 
errors 1n external memory, then both soft error detection and correction must be done mm 
hardware at the time the AHB transfer 1s made. The DHREADY and IHREADY 
signals can be used to extend the transfer until corrected data 1s available. 
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Chapter 7 
Noncacheable Instruction Fetches 


This chapter describes noncacheable instruction fetches in the ARM926EJ-S processor. 
It contains the following section: 


o About noncacheable instruction fetches on page 7-2. 
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About noncacheable instruction fetches 


The ARM926EJ-S processor performs speculative noncacheable instruction fetches to 
increase performance. Speculative instruction fetching 1s enabled at reset. This can be 
disabled using bit 16 1n the debug state register CP15 cl5. See Test and Debug Register 
cl5 on page 2-34. If prefetching 1s disabled only instruction fetches issued directly by 
the ARM9EJ-S core result 1n instruction fetches on the AHB interface. 


The following subsection 1s divided into: 
o Uses of noncacheable code 

º Self modifying code 

o AHB behavior on page 7-3. 


Uses of noncacheable code 


Although noncacheable code performance has been improved compared with other 
ARMO family cached cores, 1t 1s still recommended that the ICache 1s used im 
preference, where practical. 


Noncacheable code has previously been used for boot loaders of operating systems and 
for preventing cache pollution. Itis worth noting that the ICache can be enabled without 
the MMU being enabled, see Chapter 4 Caches and Write Buffer, and that cache 
pollution can be controlled using the cache lockdown register, see Cache Lockdown and 
TCM Region Registers c9 on page 2-25. 


Self modifying code 


A four-word buffer 1s used to hold speculatively fetched instructions. Only sequential 
instructions are fetched speculatively, and 1n the event of the ARMOEJ-S core issuing a 
nonsequential instruction fetch, the contents of the buffer are discarded (flushed). In 
situations where the contents of the prefetch buffer might become invalid during a 
sequence of sequential instruction fetches by the ARM9EJ-S core, for example turning 
the MMU on or off, or turning on the ICache, the prefetch buffer 1s also flushed. This 
avoids the requirement for an explicit Instruction Memory Barrier (IMB) operation to 
be performed, except when self-modifying code 1s used. Because the prefetch buffer 1s 
flushed when the ARM9EJ-S core issues a nonsequential instruction fetch, a branch 
instruction, or equivalent, can be used to implement the required IMB behavior. This 1s 
Illustrated by the following code sequence: 


LDMIA  RQ,(fRI-R5) * load code sequence into R1-R5 

ADR RQ,self mod code 

STMIA  RQ,(RI-R5) * Store code sequence (nonbuffered region) 
B self. mod code * branch to modified code 


self mod code: 
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This IMB implementation only applies to the ARM926EJ-S processor running code 
from a noncacheable region of memory. If code 1s run from a cacheable region of 
memory, or a different device 1s used then a different IMB implementation 1s required. 
IMBs are described mm Chapter 9 Instruction Memory Barrier. 


7.1.3 AHB behavior 
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IH instruction prefetching 1s disabled, all instruction fetches appear on the AHB interface 
as single, nonsequential fetches. 


If prefetching 1s enabled then instruction fetches either appear as bursts of four 
instructions, or as single, nonsequential fetches. No speculative instruction fetching 1s 
done across a 1IKB boundary. 


All instruction fetches, including those made in Thumb state, are word transfers (32 
bits). In Thumb state a single-word instruction fetch reads two Thumb instructions, and 
a four-word burst reads eight instructions. 
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Chapter 8 
Coprocessor Interface 


This chapter describes the ARM926EJ-S coprocessor Interface. It contains the 
following sections: 


o About the ARM926EJ-S external coprocessor interface on page 8-2 
o LDC/STC on page 8-4 

o MCR/MRC on page 8-6 

o CDP on page 8-8 

o Privileged instructions on page 8-9 

o Busy-waiting and interrupts on page 8-10 

o CPBURST on page 8-11 

o CPABORT on page 8-12 

o nCPINSTRVALID on page 8-13. 
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8.1 About the ARM926EJ-S external coprocessor interface 


8-2 


The ARM926EJ-S supports the connection of on-chip coprocessors to the ARM9EJ-S 
core through an external coprocessor interface. All types of coprocessor instructions are 
supported. 


Coprocessors determine the instructions that they have to execute by using a pipeline 
follower in the coprocessor. As each Instruction arrives from memory, it enters both the 
ARMOEJ-S pipeline and the coprocessor pipeline. To avoid a critical path for the 
instruction being latched by the coprocessor, the coprocessor pipeline must operate one 
clock cycle behind the ARM9EJ-S core pipeline. 


The two pipelines are synchronized by stalling the ARM9EJ-S core pipeline 1n 1ts first 
Execute cycle whenever an external coprocessor Instruction moves from the Decode to 
the Execute stage. 


To enable coprocessors to continue execution of coprocessor data operations while the 
ARMDEJ-S core pipeline 1s stalled, for example while waiting for a cache linefill to 
occur, the coprocessor receives the clock CLK, and a clock enable signal CPCLKEN. 
You can use these to produce a gated coprocessor clock with the circuit shown im 
Figure 8-1. 


CLK 


CPCLKEN Coproc clock 


Figure 8-1 Producing a coprocessor clock 


Figure 8-2 indicates the timing for these signals and when the coprocessor pipeline 
must advance 1ts state. 


CLK 


CPCLKEN 





Coproc clock 


Figure 8-2 Coprocessor clocking 


This 1s one technique for generating a clock that reflects the ARM9EJ-S core pipeline 
advancing. If CPCLKEN is LOW on the rising edge of CPCLK then the ARM9EJ-S 
core pipeline 1s stalled and the coprocessor pipeline must not advance. 
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8.1.1 Coprocessor instructions 
There are three classes of coprocessor instructions: 


LDC or STC Load coprocessor register from memory or store coprocessor 
register to memory. 


MCR/MCRR or MRC/MRRC 
Register transfer between the coprocessor and the ARM processor 
core. 

CDP Coprocessor data operation. 


Examples of how a coprocessor must execute these instruction classes are given 1n: 
o LDC/STC on page 8-4 

o MCR/MRC on page 8-6 

o CDP on page 8-8. 
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8.2 LDC/STC 
The cycle timing for this operation is shown in Figure 8-3. 


Execute | Execute € Execute | Execute | | 
(60) - (60) (60) : (LAST) - Memory Write 


Fetch É Decode 


Coprocessor 
pipeline 





CLK 


CPINSTR[31:0] 





nCPMREQ 


CPPASS 


CPLATECANCEL 


CHSDE[1:0] 


CHSEX(1:0] 


CPDOUT|31:0] 
LDC 


CPDIN[31:0] 
STC 





Figure 8-3 LDC/STC cycle timing 


In Figure 8-3 four words of data are transferred. The number of words transferred 1s 
determined by how the coprocessor drives the CHSDE[1:0] and CHSEX[1:0] buses. 


As with all other instructions, the ARM9EJ-S core performs the main decode off the 
rising edge of the clock during the Decode stage. From this, the core commits to 
executing the instruction and so performs an instruction fetch. The coprocessor 
instruction pipeline keeps 1n step with the ARM9EJ-S core by monitoring nCPMREQ. 
nCPMREQ 1s an active LOW signal that indicates 1f the ARM9EJ-S pipeline has 
advanced. CPINSTR is updated with the fetched instruction in the next cycle. This 
means that the instruction currently on CPINSTR must enter the Decode stage of the 
coprocessor pipeline, and that the instruction 1n the Decode stage of the coprocessor 
pipeline must enter its Execute stage. 


During the Execute stage, the condition codes are combined with the flags to determine 
1f the instruction executes or not. The output CPPASS 1s asserted HIGH 1f the 
instruction 1n the Execute stage of the coprocessor pipeline: 


o Is a coprocessor Instruction 
o has passed its condition codes. 
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If a coprocessor Instruction busy-waits then CPPASS 1s asserted on every cycle until 
the coprocessor instruction 1s executed. If an interrupt occurs during busy-waiting then 
CPPASS 1s driven LOW and the coprocessor must stop the coprocessor instruction 
execution. 


Another output, CPLATECANCEL is used to cancel a coprocessor instruction when 
the instruction preceding 1t caused a Data Abort. This 1s valid on the rising edge of CLK 
on the cycle after the first coprocessor Execute cycle of a coprocessor instruction. 


On the rising edge of the clock the ARM9EJ-S core examines the coprocessor 
handshake signals CHSDE[1:0] and CHSEX[1:0]: 


o If a new instruction 1s entering the Execute stage in the next cycle, then 1t 
examines CHSDE[1:0] 
o 1f the coprocessor instruction currently mn Execute requires another Execute cycle, 


then it examines CHSEX[1:0]. 
The handshake signals encode one of four states, as shown in Table 8-1. 


Table 8-1 Handshake signal encoding 


Description 


If there 1s a coprocessor attached that can handle the instruction, but not immediately, then the 
coprocessor handshake signals are driven to indicate that the ARMO9EJ-S core has stalled. This 1s 
known as the busy-wait condition. In the busy-wait condition, the ARM9EJ-S core loops 1n an idle 
state waiting for CHSEX[1:0] to be driven to another state, or for an interrupt to occur. If 
CHSEX[1:0] changes to ABSENT then the undefined instruction trap is taken. If CHSEX[1:0] 
changes to GO or LAST then the instruction proceeds, as described in GO. If an interrupt occurs 
then the ARMOEJ-S core 1s forced out of the busy-wait state. This 1s indicated to the coprocessor 
by the CPPASS signal going LOW. When the instruction is restarted the coprocessor must not 
commit to the instruction, that 1s change any of the coprocessor state, until the coprocessor has 
seen CPPASS HIGH when the handshake signals indicate the GO or LAST condition. 





GO 01 


The GO state Indicates that the coprocessor can execute the instruction immediately, and that 1t 
requires another cycle of execution. Both the ARM9EJ-S core and the coprocessor must consider 
the state of the CPPASS signal before committing to the instruction. For an LDC or STC 
instruction, then the coprocessor instruction drives the handshake signals with GO when two or 
more words still have to be transferred. When only one more word 1s required the coprocessor 
drives the handshake signals with LAST. 





ABSENT . 10 


If there 1s no coprocessor attached that can execute the coprocessor instruction, then the handshake 
signals indicate the ABSENT state and the ARMO9EJ-S core takes the undefined instruction trap. 





LAST 1 
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An LDC or STC instruction might transfer more than one word of data. If this 1s the case then, 
possibly after busy waiting, the coprocessor drives the coprocessor handshake signals with a 
number of GO states, followed by a LAST cycle. The LAST indicates that the next transfer 1s the 
final one. If there was only one transfer then the sequence would be [WAIT [WAIT....]|, LAST. 
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8.3 MCR/MRC 


These cycles look very similar to STC/LDC. An example with a busy-wait state 18 
shown in Figure 8-4. 


; WA) (LAST) Memory Write 





Coprocessor pipeline 





CLK 
CPINSTR[31:0] MCR/M RC IS 
nCPMREQ 





CPPASS 





CPLATECANCEL 7 | € | 
CHSEX[1:0] | - 7 | E 


Figure 8-4 MCR/MRC cycle timing 





First, nCPMREQ is driven LOW to indicate that the instruction on CPINSTR is 
entering the Decode stage of the pipeline. This coprocessor decodes the new instruction 
and drives CHSDE[1:0] as required. 


In the next cycle, nCPMREQ is driven LOW to indicate that the instruction has now 

been issued to the Execute stage. If the condition codes pass and the instruction 1s to be 
executed, the CPPASS signal is driven HIGH and the CHSDE[1:0] handshake bus 1s 

examined. It 1s ignored 1n all other cases. 


For any successive execute cycles the CHSEX[1:0] handshake bus is examined. When 
the LAST condition 1s observed, the instruction 1s committed. In the case of an MCR, 
the CPDOUT[31:0] bus is driven with the register data during the coprocessor Write 
stage. In the case of aa MRC, CPDIN[31:0] 1s sampled at the end of the ARM9EJ-S 
memory stage and written to the destination register during the next cycle. 


8.3.1 Interlocked MCR 


If the data for aa MCR operation 1s not available inside the ARM9EJ-S core pipeline 
during 1ts first Decode cycle, then the ARM9EJ-S core pipeline interlocks for one or 
more cycles until the data 1s available. An example of this 1s where the register being 
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transferred 1s the destination from a preceding LDR instruction. In this situation the 
MCR instruction enters the Decode stage of the coprocessor pipeline, and remains there 
for a number of cycles before entering the Execute stage. 


Figure 8-5 shows an example of an interlocked MCR. 


Coprocessor pipeline Fetch |. Decode . Decode | Execute |. Execute . Memory — Write 
i Ee ; ' (interlock) (WAIT) (LAST) ; 


CLK 


CPINSTR[31:0] VY MCRIMRC 


nCPMREQ 











CPPASS 
CPLATECANCEL 
CHSDE[1:0] | | 
cHsEx[1:0] 7 | | | | 
adia E 
MCR 
a DD seres 
MRC oproc Coproc data 


Figure 8-5 Interlocked MCR 
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8.4 CDP 


CDP instructions usually execute 1n a single cycle. Like all the previous cycles, 
nCPMREQ 1s driven LOW to signal when an instruction 1s entering the Decode and 
then the Execute stage of the pipeline. If the instruction 1s to be executed then the 
CPPASS signal is driven HIGH during Execute. If the coprocessor can execute the 
instruction immediately 1t drives CHSDE[1:0] with LAST. If the instruction requires a 
busy-wait cycle, then the coprocessor drives CHSDE[1:0] with WATT and then 
CHSEX[1:0] with LAST. Figure 8-6 shows a CDP that 1s canceled because of the 
previous iInstruction causing a Data Abort. 


Instruction 
aborted 


Fetch Ê Decode | Execute | Memory 





Coprocessor pipeline : 


CLK 


CPINSTRI[31:0] 


nCPMREQ | É 


CPPASS 








CPLATECANCEL 


CHSDE[1:0] 





Figure 8-6 Latecanceled CDP 


The CDP instruction enters the Execute stage of the pipeline and is signaled to execute 
by CPPASS. In the following phase CPLATECANCEL 1s asserted. This causes the 
coprocessor to terminate execution of the CDP instruction and for 1t to cause no state 
changes to the coprocessor. 





Note 


CPLATECANCEL can be asserted during the Memory cycle or during the Execute 
cycle. The coprocessor must be able to handle instruction aborts during these two 
stages. 
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8.5 Privileged instructions 


The coprocessor might restrict certain instructions for use 1n privileged modes only. To 
do this, the coprocessor has to track the nCPTRANS output. 


Figure 8-7 shows how nCPTRANS changes after a mode change. 


“ Instruction 
Memory . aborted 


Fetch | Decode | Decode | Decode | Execute 
Coprocessor pipeline 







CLK 








CPINSTR[31:0] € EE | € 7 | | 


nCPMREQ 


New mode 





nCPTRANS Old mode 








ces E o. 
cerca. RR o VIA 
CHSDE[1:0] Ignored ) Ignored. LAST 


cusexgro 


Figure 8-7 Privileged instructions 
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8.6 Busy-waiting and interrupts 


The coprocessor 1s permitted to stall (busy-wait) the processor during the execution of 
a coprocessor instruction 1f, for example, 1t is still busy with an earlier coprocessor 
instruction. To do so, the coprocessor associated with the Decode stage instruction 
drives WAIT on CHSDE[1:0]. When the instruction concerned enters the Execute stage 
of the pipeline, the coprocessor can drive WAIT onto CHSEX[1:0] for as many cycles 
as required to keep the instruction 1n the busy-wait loop. 


For interrupt latency reasons the coprocessor might be interrupted while busy-waiting, 
causing the instruction to be abandoned using CPPASS. The coprocessor must monitor 
the state of CPPASS during every busy-wait cycle. Ifitis HIGH the instruction must be 
executed. If it is LOW the instruction must be abandoned. 


Figure 8-8 shows a busy-waited coprocessor instruction being abandoned because of an 
interrupt. 


Execute Execute Execute | Execute 
(WAIT) | (WAIT) | (WAIT) | interrupted 


Fetch É Decode 





Coprocessor pipeline : 







CLK | 
CPINSTR[31:0] | CPInstr | 
nCPMREQ | É 


CPPASS 


CPLATECANCEL 





CHSDE[1:0] 


















CHSEX[1:0] 


Figure 8-8 Busy waiting and interrupts 


In Figure 8-8, CPLATECANCEL is also asserted as a result of the Execute 
interruption. 
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8.7 CPBURST 


The CPBURST signal is used by the external coprocessor to indicate the number of 
words to be transferred in an LDC or STC operation. CPBURST is used by the 
ARM926EJ-S memory system to optimize LDC/STC instructions that access either 
noncacheable or nonbufferable regions of memory. The encoding of CPBURST is 
shown in Table 8-2. 


Table 8-2 CPBURST encoding 


CPBURST/[3:0] Number of words to transfer 

















b0000 1 word or unknown 
b0001 2 words 

b0010 3 words 

b1110 15 words 

b1111 16 words 


The encoding for a single word transfer and an unknown number of transfers 1s the 
same. If CPBURST is set to bO000 for an STC or LDC operation, and this results im an 
access to either a noncached or nonbuffered region of memory, then any resultant AHB 
bus transfers are performed as individual nonsequential accesses. 


CPBURST is driven by external coprocessors 1n the same cycle as the CHSDE 
response. This must be driven to b0O000 at all other times. An example of a transfer that 
uses CPBURST is shown 1n Figure 8-9 on page 8-12. 
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8.8 CPABORT 


The CPABORT signal being asserted HIGH indicates that an LDC/STC instruction has 
aborted. CPABORT is asserted in the cycle after the Memory stage of the aborting 
LDC/STC instruction. This is shown in Figure 8-9. 


Execute 2 | Memory2 . Write 2 


Fetch | Decode É Execute 1 ; Memory 1 É Write 1 





Coprocessor pipeline 





CLK | | | | | | | 
CPINSTR(31:0] TT LDC/ST E | | € É | 
nCPMREQ - 7 7 | : 7 









CHSDE[1:0] 
CHSEX[1:0] 


CPBURST 


CPABORT 





Figure 8-9 CPBURST and CPABORT timing 
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8.9 nCPINSTRVALID 


ARM DDI 0198E 


The nCPINSTRVALID signal indicates 1f the instruction currently on the CPINSTR 
bus 1s valid, and must be decoded by the coprocessor. If nCPINSTRVALID is 1, then 
the instruction must not be decoded by the coprocessor and an ABSENT response must 
be made for all corresponding Decode cycles for this instruction. 


nCPINSTRVALID is the equivalent of the CPTBTT signal in the ARM946E-S and 
ARMOY66E-S processors. 
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8.10 Connecting multiple external coprocessors 


If multiple coprocessors are connected to the ARM926EJ-S processor, then outputs of 
the various coprocessors must be combined to form a single set of coprocessor inputs. 
The coprocessor handshake signals are combined together by ANDing the top bit and 
ORing the bottom bit. This enables a coprocessor to produce a fixed response of b10 
(Absent), when 1t is inactive. The other external coprocessor inputs, CPDIN and 


CPBURST, are combined by ORing. This is shown in Figure 8-10. 


ARM926EJ-S 
CHSDE[1:0] 


CHSEX[1:0] 


CPBURSTI3:0] 


CPDIN[3:0] 


CHSDE[1] 


CHSDE[0] 


CHSEX(1] 


CHSEX[0] 


CHSDEAa[1] 
CHSDEb[1] 


CHSDEAl0] 
CHSDEbJ[0] 
CHSEXal1] 
CHSEXb[1] 
CHSEXal0] 
CHSEXbI0] 
CPBURSTa[3:0] 
CPBURSTb|3:0] 
CPDINa[3:0] 
CPDINb[3:0] 


Figure 8-10 Arrangement for connecting two coprocessors 


The OR arrangement for CPBURST and CPDIN means that coprocessors must drive 
zero values onto their CPBURST and CPDIN outputs when they are inactive, or do not 
own the corresponding coprocessor pipeline stage associated with these signals. 
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Chapter 9 
Instruction Memory Barrier 


This chapter describes the ARM926EJ-S Instruction Memory Barrier (IMB) operation. 
It contains the following sections: 


o About the instruction memory barrier operation on page 9-2 
o IMB operation on page 9-3 
o Example IMB sequences on page 9-5. 
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9.1 About the instruction memory barrier operation 


Whenever code is treated as data, for example self-modifying code, or loading code into 
memory, then a sequence of instructions called an Instruction Memory Barrier (IMB) 
operation must be used to ensure consistency between the data and instruction streams 
processed by the ARM926EJ-S processor. 


Usually the instruction and data streams are considered to be completely independent 
by the ARM926EJ-S processor memory system, and any changes 1n the data side are 
not automatically reflected 1n the instruction side. For example 1f code 1s modified in 
main memory then the ICache might contain stale entries. To remove these stale entries 
part or all of the ICache must be invalidated. 
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9.2 IMB operation 


To ensure consistency between data and instruction sides, you must take the following 
steps: 

1. Cleanthe DCache 

Drain the write buffer 

Synchronize data and instruction streams in level two AHB subsystems 
Invalidate the ICache on page 9-4 

Flush the prefetch buffer on page 9-4. 


EA, ip 


9.2.1 Clean the DCache 


If the cache contains cache lines corresponding to write-back regions of memory, then 
1t might contain dirty entries. These entries must be cleaned to make external memory 
consistent with the DCache. If only a small part of the cache has to be cleaned, then this 
can be done by using a sequence of clean DCache single entry instructions, or 1f the 
entire cache has to be cleaned, then this can be done efficiently using the test and clean 
instruction. See Cache Operations Register c7 on page 2-19 for details of cache 
maintenance operations. 


9.2.2 Drain the write buffer 


Executing a dram write buffer instruction causes the ARMO9EJ-S core to wait until 
outstanding buffered writes have completed on the AHB interface. This includes writes 
that occur as a result of data being written back to main memory because of clean 
operations, and data for store instructions. 


9.2.3 Synchronize data and instruction streams in level two AHB subsystems 


ARM DDI 0198E 


The level two AHB subsystem might also require explicit synchronization between data 
and instruction sides. Itis possible for the data and instruction AHB masters to be 
attached to different AHB subsystems. Even 1f both masters are present on the same bus, 
some form of separate ICache might exist for performance reasons, and this has to be 
invalidated to ensure consistency. 


The process of synchronizing instructions and data 1n level two memory must be 
invoked using some form of fully blocking operation. This 1s to ensure that the end of 
the operation can be determined using software. It is recommended that either a 
nonbuffered store (STR) or a noncached load (LDR) 1s used to trigger external 
synchronization. 
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9.2.4 Invalidate the ICache 


The ICache must be invalidated to remove any stale copies of instructions that are no 
longer valid. If the ICache is not being used, or the modified regions are not in cacheable 
areas of memory, then this might not be required. 


9.2.5 Flush the prefetch buffer 


To ensure consistency, the prefetch buffer must be flushed before self-modifying code 
1s executed. See Self modifying code on page 7-2. 
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Instruction Memory Barrier 


The following sequence corresponds to steps 1-4 1n IMB operation on page 9-3: 


clean loop 
MRC p15, 0, rl5, cf, cl0, 3; 
BNE clean loop 


MCR p15, O, r0, c7, cl0, 4. ; 
STR rx, [ry] 


MCR p15, O, r0, cí7, c5, Q 


clean entire dcache using test and clean 


drain write buffer 
nonbuffered store to signal L2 world to 


: Synchronize 


invalidate Tcache 


The following sequence illustrates an IMB sequence used after modifying a single 
instruction, for example setting a software breakpoint, with no external synchronization 


required: 

STR rx, [ry] * Store that modifies instruction at address ry 
MCR p15, 0, ry, c7, cl0, 1 * Clean dcache single entry (MVA) 

MCR p15, O, rQ, c7, clQ0, 4 “* drain write buffer 

MCR p15, O, ry, c7, c5, 1 » Invalidate icache single entry (MVA) 
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Chapter 10 
Embedded Trace Macrocell Support 


This chapter describes the Embedded Trace Macrocell (ETM) support for the 
ARM926EJ-S processor. It contains the following section: 


o About Embedded Trace Macrocell support on page 10-2. 
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10.1 


10.1.1 


10-2 


About Embedded Trace Macrocell support 


FIFOFULL 


To support real-time trace, the ARM926EJ-S processor provides an interface to enable 
connection of an Embedded Trace Macrocell (ETM). For more information on the 
ETM, see the ETMO Technical Reference Manual. 


The ETM consists of two parts: 


Trace port A trace protocol has been developed to provide a real-time trace 
capability for processor cores that are deeply embedded in larger ASIC 
designs. Because the ASIC normally includes significant amounts of 
on-chip memory, 1t 1s not possible to determine how the processor core 15 
operating by only observing the pins of the ASIC. A trace port is required 
to understand the operation of the processor. 

Triggering facilities 
An extensible specification exists, enabling you to specify the exact set 
of trigger resources required for a particular application. Resources 
include address and data comparators, counter, and sequencers. 


The ETM 1s used to compress the trace information and export 1t through a narrow trace 
port. An external Trace Port Analyzer (TPA) 1s used to capture the trace information. 


The ARM926EJ-S ETM interface exports the required signals for the ETM to perform 
trace. The interface 1s enabled and disabled by the ETMEN input signal. Where an 
ETM module 1s not required, the ETMEN imput can be tied LOW to disable the trace 
outputs and save power. 


Whenever the ETM FIFO fills up, the ETM asserts its FIFOFULL signal. To prevent 
loss 1n trace coverage, the ARM926EJ-S processor stalls until FIFOFULL is 
deasserted. 


The ARM926EJ-S processor only stalls on instruction boundaries, to allow any AHB 
transfers to complete. Programming of the ETM FIFO watermark must take this into 
consideration. If the current instruction 1s either an LDM or an STM, then the FIFO might 
have to accept up to 16 words after FIFOFULL has been asserted. 


Interrupts, FIQ or IRQ, prevent the ARM926EJ-S processor from stalling when 
FIFOFULL is asserted, unless they are masked. See Test and Debug Register cl5 on 
page 2-34 for details of how interrupts can be masked during trace. 
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Note 


Stalling the core with FIFOFULL affects real-time operating performance. If 
connected, an ETM must be disabled during normal ARM926EJ-S processor operation 
to prevent FIFOFULL adversely affecting the ARM926EJ-S processor performance. 
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Chapter 11 
Debug Support 


This chapter describes the debug support for the ARM926EJ-S processor. It contains the 
following section: 


o About debug support on page 11-2. 
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11.1 About debug support 


Debug support 1s implemented by using the ARM9EJ-S core embedded within the 
ARMO926EJ-S processor. Full details of the debug support provided by the ARM9EJ-S 
core are described in the ARM9EJ-S Technical Reference Manual. 


Debug support for the ARM926EJ-S memory system 1s implemented by extending the 
debug facilities providing access to CP15 using an ARMO9EJ-S external scan chain, scan 
chain 15. This scan chain 1s external to the ARM9EJ-S core but internal to the 
ARMD926EJ-S processor. 


11.1.1 | Debug clocks 


The system and test clocks must be synchronized externally to the ARM926EJ-S 
macrocell. To synchronize off-chip debug clocking with the ARM926EJ-S macrocell 
requires a three-state synchronizer. This 1s described 1n the debug chapter of the 
ARMOEJ-S Technical Reference Manual. 


11.1.2 Scan chain 15 


Scan chain 15 enables access to the CP15 registers. Scan chain 15 1s 48 bits long. 
Table 11-1 shows the bit assignments for scan chain 15. 


Table 11-1 Scan chain 15 format 
Bits Function 


[47] Write, not read (W/R) 





[46:33] Register address 





[32] Imtiate access/access complete 


When written: 


O = NOP 
1 = initiate new access 
When read: 


O = access incomplete 


1 = access complete 





[31:0] Data value 


With scan chain 15 selected, TDI is connected to bit 47 and TDO is connected to bit O. 
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To perform an access using scan chain 15, you must: 


l. During the SHIFT-DR state of the TAP state machine, shift in the read/write bit, 
register address, and register data value for writing, with bit 32 set to 1. For read 
operations the data value field does not have to be written. 


2. Move through UPDATE-DR. The operation specified by the register address and 
write not read bits does not start. 


3. Returnto SHIFT-DR and perform a shift operation so that bits 32, and [31:0] are 
read, and a NOP instruction, bit 32 = O, 1s shifted 1n. 


4. Move through UPDATE-DR. No operation 1s performed because bit 32 15 O. 


5. Check the access complete value that 1s shifted out. If it is 1, the operation has 
completed and bits [31:0] contam valid data for reads. Ifitis O, the access has not 
completed and you must go back to step 3. 


Note 


If Multi-ICE is used, then this has the restriction that a maximum of 40 bits of any scan 
chain can be written at a time. Because scan chain 15 1s 48 bits long, CP15 register 
writes require two operations to write all the required bits, and initiate the access. This 
can be done by first writing bits [31:0] with the required data value, and bit 32 to O. This 
has the effect of presetting the data value field for the next operation. The second 
operation sets bits [47:33] to the required values, and bit 32 to 1 to initiate the access. 
This relies on the specific behavior of scan chain 15 that enables data to be recirculated 
1f a value 1s scanned in with bit 32 set to O, and there 1s no pending access. In this case 
the transition through UPDATE-DR does not modify the contents of the scan chain, and 
the value written in can safely be read back out 1n a subsequent CAPTURE-DR, 
SHIFT-DR sequence. 





The mapping of scan chain 15 to CP15 registers 1s done in the same way as a CP15 
MRC/MCR operation. Bits [46:33] of the scan chain are mapped onto Opcode 1, 
Opcode 2, CRn, and CRm. 


The mapping of the register address field to the CP15 registers 1s shown 1n Table 11-2. 
Table 11-2 Scan chain 15 mapping to CP15 registers 
MRC/MCR instruction field Scan chain 15 mapping 


Opcode 1 [46:44] 
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Table 11-2 Scan chain 15 mapping to CP15 registers (continued) 


MRC/MCR instruction field Scan chain 15 mapping 








Opcode 2 [43:41] 
CRn [40:37] 
CRm [36:33] 


Writes to either the cache operations register (CRn = c'7) or the TLB operations register 
(CRn = c8), that require a form of address to select an entry to be manipulated, use the 
data value part of the scan chain to provide the address information. The format of the 
address field 1s identical to that used for the value of Rd, for the equivalent MCR 
instruction. 


Memory system debug operations (CRn = cl5), that require an address to be used to 
select an entry, use the value held in the debug address register. See Debug and Test 
Address Register on page B-4. The format of the address field 1s identical to that used 
for the value of Rd, for the equivalent MCR instruction. 


If an invalid instruction is scanned into scan chain 15, it is translated into a read of the 
ID register. This means that you can check the output data for ID register reads to 
indicate that an invalid instruction has been scanned 1n. 
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Power Management 


This chapter describes the power management facilities provided by the ARM926EJ-S 
processor. It contains the following section: 


o About power management on page 12-2. 
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12.1 About power management 


The power management facilities provided by the ARM926EJ-S processor are: 
o Dynamic power management (wait for interrupt mode) 


o Static power management (leakage control) on page 12-3. 


12.1.1 Dynamic power management (wait for interrupt mode) 


The ARM926EJ-S processor can be put into a low-power state by the wait for interrupt 
instruction: 


MCR p15,0,<Rd>,c7,cQ,4 


This instruction switches the ARM926EJ-S processor into a low-power state until either 
an interrupt (IRQ or FIQ) or a debug request occurs. The debug request can either be an 
external debug request EDBGROQ or a debug request made by the debugger by writing 
to the DBGRQ bit of the ARMOEJ-S debug control register using scan chain 2. 


In wait for interrupt mode, all internal ARM926EJ-S clocks can be stopped. The switch 
into the low-power state 1s delayed until all write buffers have been draimed, and the 
ARMO926EJ-S memory system 1s 1n a quiescent state. 


The switch into low-power state 1s indicated by the assertion of the STANDBY WFI 
signal. If STANDBYWFI is asserted then 1t is guaranteed that all of ARM926EJ-S 
external interfaces (AHB, TCM, and external coprocessor) are 1n an idle state. The 
STANDBYWFI signal is intended to be used to shut down clocks to other parts of the 
system, such as external coprocessors, that do not have to be clocked 1f the 
ARMO926EJ-S processor 1s idle. 


The STANDBYWFI signal is deasserted 1n the second cycle following an interrupt or 
a debug request. It is guaranteed that no form of access on any external interface 1s 
started until the cycle after STANDBY WFTI is deasserted. Figure 12-1 shows the 
deassertion of the STANDBYWFI signal after an IRQ interrupt. 


CLK 








STANDBYWFI | | 





nIRQ 


Figure 12-1 Deassertion of STANDBYWFI after an IRQ interrupt 


When the ARM926EJ-S has entered a low-power state, all of the main internal clocks 
are stopped, including the clock for the ARM9EJ-S core. However, the ARM9EJ-S 1s 
active if DBGTCKEN is asserted. This enables values to be written in the ARM9EJ-S 
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debug control register so that a debugger can force an exit from wait for interrupt mode. 
This means that you can safely stop the ARM926EJ-S CLK if STANDBYWF IL is 
HIGH and DBGTCKEN is LOW. 


Figure 12-2 shows the recommended logic for stopping the man ARM926EJ-S clock 
during wait for interrupt. 


nFIQ 


EDBGRQ 






nIRQ 
STANDBYWFI 


FCLK 


HRESETn 
FCLK = Free running clock 
CLK = Clock supplied to ARM926EJ-S macrocell 


Figure 12-2 Logic for stopping ARM926EJ-S clock during wait for interrupt 


The nature of the nFIQ, nIRQ, and EDBGRO signals enables them to be registered 
prior to being used in the gating logic. DBGTCKEN must be used combinationally to 
maintain the relationship between the ARM926EJ-S JTAG logic and the RTCK signal 
used by the debugger. See the ARMDEJ-S Technical Reference Manual for details of 
how DBGTCKEN is generated and used. 


Caution 

If the ARM926EJ-S 1s 1n low power WFI mode, STANDBYWFTL is HIGH and 
BLOCK LEVEL CLOCK GATING is not enabled, then external interface 
transactions, for example TCM transactions, will cause the STANDBY WFI signal to 
go LOW when the external interface(s) are not 1dle. 





This might be an unwanted side effect m systems that trigger external logic off the 
falling, or rising, edge of the STANDBY WFTI signal. 


This effect does not occur if BLOCK LEVEL CLOCK GATING is enabled, 
STANDBYWFI stays HIGH even 1f there 1s external interface activity. 


12.1.2 Static power management (leakage control) 


The ARM926EJ-S design 1s partitioned so that the SRAM blocks that are used for the 
caches and the MMU can be powered down under certamm conditions. 
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Cache RAMs 


The RAMs for either of the caches can be safely powered down 1f the respective cache 
has been disabled, using CP15 control register cl, and 1t contains no valid entries. While 
a cache 1s disabled, only explicit CP15 operations can cause the cache RAMs to be 
accessed (c'7 cache maintenance operations). These instructions must not be executed 
while any of the cache RAMs are powered down. If any of the RAMs for a cache have 
been powered down, then they must be powered up prior to re-enabling the relevant 
cache. 


MMU RAMs 


The RAM used to implement the MMU can be safely powered down 1f the MMU has 
been disabled, using CP15 control register cl, and 1t contains no valid entries. While the 
MMU 1s disabled, only explicit CP15 operations can cause the MMU RAM to be 
accessed (c8 TLB maintenance operations, and cl5 MMU test/debug operations). 
These instructions must not be executed while the MMU RAM 1s powered down. The 
MMU RAM must be powered up prior to re-enabling the MMU. 
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Signal Descriptions 


This appendix describes the ARM926EJ-S processor input and output signals. It 
contains the following sections: 


o Signal properties and requirements on page A-2 
o AHB related signals on page A-3 

o Coprocessor interface signals on page A-5 

o Debug signals on page A-'7 

o JTAG signals on page A-8 

o Miscellaneous signals on page A-9 

o ETM interface signals on page A-10 

o TCM interface signals on page A-12. 
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A.1 


A-2 


Signal properties and requirements 


To ensure ease of integration of the ARM926EJ-S processor into embedded 
applications, and to simplify synthesis flow, the following design techniques have been 
used: 


o a single rising edge clock times all activity 
o all signals and buses are unidirectional 
o all inputs are required to be synchronous to the single clock. 


These techniques simplify the definition of the top-level ARM926EJ-S processor 
signals because all outputs change from the rising edge and all inputs are sampled with 
the rising edge of the clock. In addition, all signals are either input or output only. 
Bidirectional signals are not used. 





Note 


You must use external logic to synchronize asynchronous signals, for example interrupt 
sources, before applying them to the ARM926EJ-S processor. 
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AHB related signals 


Table A-1 describes the ARM926EJ-S processor AHB related signals. 


Table A-1 AHB related signals 






























































Signal name Direction Description 

DHADDR[31:0] Output AHB address (data). 

DHBL[3:0] Output Byte lane indicator for current transfer. 

DHBURSTI2:0] Output AHB burst size (data). 

DHBUSREQ Output AHB bus request (data). 

DHCLKEN Input Signifies the rising edge of HCLK for the data AHB. If CLK and HCLK are the 
same frequency, DHCLKEN must be tied HIGH. 

DHGRANT Input AHB bus grant signal (data). 

DHLOCK Output AHB bus lock signal (data). 

DHPROT|3:0] Output AHB bus access information (data). 

DHRDATA[31:0] Input AHB read data (data). 

DHREADY Input AHB transfer complete signal (data). 

DHRESP|[1:0] Input AHB transfer response (data). 

DHSIZE[2:0] Output AHB transfer size (data), indicating byte, halfword, or word. DHSIZE[2] is tied 
LOW. 

DHTRANS[1:0] Output AHB transfer type (data). 

DHWDATA[31:0] Output AHB write data (data). 

DHWRITE Output AHB transfer direction (data). 

HRESETn Input AHB reset signal. 

IHADDR[31:0] Output AHB address (instruction). 

IHBURSTI/2:0] Output AHB burst size. (instruction). 

IHBUSREQ Output AHB bus request (instruction). 

IHCLKEN Input Signifies the rising edge of HCLK for the instruction AHB. If CLK and HCLK are 
the same frequency, IHCLKEN must be tied HIGH. 

IHGRANT Input AHB bus grant signal (instruction). 
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Table A-1 AHB related signals (continued) 























Signal name Direction Description 
IHLOCK Output AHB bus lock signal (instruction). 
IHPROT|3:0] Output AHB bus access information (instruction). 
IHREADY Input AHB transfer complete signal (instruction). 
IHRDATA[31:0] Input AHB read data (instruction). 
IHRESP[1:0] Input AHB transfer response (instruction). 
IHSIZE[2:0] Output AHB transfer size (instruction), indicating byte, halfword, or word. IHSIZE[2] is 
tied LOW. 
IHTRANS[1:0] Output AHB transfer type (instruction). 
IHWRITE Output AHB transfer direction (instruction). 
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A.3 'Coprocessor interface signals 
Table A-2 describes the ARM926EJ-S processor coprocessor interface signals. 


Table A-2 Coprocessor interface signals 











Name Direction Description 

CPABORT Output Indicates STC/LDC operation aborted. Asserted in WB stage of 
coprocessor pipeline. 

CPBURST]3:0] Input Indicates number of words to be transferred for LDC/STC operation. If 
no external coprocessors are attached, this must be tied to bO000. 

CPCLKEN Output Coprocessor clock enable. When HIGH on the rising edge of CLK 

Coprocessor clock enable the pipeline follower logic can advance. 

CPDIN[31:0] Input The coprocessor data bus for transferring data from the coprocessor. 


Coprocessor write data 





CPDOUT[31:0] Output The coprocessor data bus for transferring data to the coprocessor. 


Coprocessor read data 











CPEN Input When LOW disables the external coprocessor interface. If CPEN is 

Coprocessor enable LOW then CHSDE and CHSEX must both be driven to b1IO 
(ABSENT response). 

CPINSTR][31:0] Output The coprocessor instruction bus that instructions are transferred over 

Coprocessor instruction data to the pipeline follower 1n the coprocessor. 

CPPASS Output Indicates that there is a coprocessor instruction in the Execute stage 


of the pipeline, that must be executed. 





CPLATECANCEL Output If HIGH during the first Memory cycle of a coprocessor instruction, 
then the coprocessor must cancel the instruction without changing 
any internal state. 





CHSDE[1:0] Input The handshake signals from the Decode stage of the coprocessor 

Coprocessor handshake decode pipeline follower. Indicates ABSENT (b10), WAIT (b00), GO (b01), 
or LAST (b11). Ifno external coprocessors are attached this must be 
tied to b10 (ABSENT response). 





CHSEX[1:0] Input The handshake signals from the Execute stage of the coprocessors 

Coprocessor handshake execute pipeline follower. Indicates ABSENT (10), WAIT (00), GO (01), or 
LAST (11). If no external coprocessors are attached these must be 
tied to b10 (ABSENT response). 
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Table A-2 Coprocessor interface signals (continued) 








Name Direction Description 

nCPINSTRVALID Output Valid instruction indicator for CPINSTR (replaces CPTBIT). 

Coprocessor valid instruction 

nCPMREQ Output If this signal is LOW on the rising edge of CLK and CPCLKEN is 

Not coprocessor instruction request HIGH, the instruction on CPINSTR must enter the coprocessor 
pipeline. 

nCPTRANS Output When LOW the coprocessor interface 1s in a nonprivileged state. 


Not coprocessor memory translate When HIGH the coprocessor interface 1s 1n a privileged state. 
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Signal Descriptions 


Table A-3 describes the ARM926EJ-S processor debug signals. 


Name Direction 
COMMRX Output 
Communications 


channel receive 


Table A-3 Debug signals 


Description 


When HIGH, this signal denotes that the comms channel receive buffer contains 
valid data waiting to be read. 





COMMTX Output 


Communications 
channel transmit 


When HIGH, this signal denotes that the comms channel transmit buffer 1s empty. 








DBGACK Output When HIGH indicates that the processor 1s 1n debug state. 
Debug acknowledge 
DBGDEWPT Input Asserted by external hardware to halt execution of the processor for debug 


Data watchpoint 


purposes. If HIGH at the end of a data memory request cycle, 1t causes the 
ARMO926EJ-S processor to enter debug state. 








DBGEN Input Enables the debug features of the processor. This signal must be tied LOW 1f 
Debug enable debug 1s not required. 
DBGEXT[1:0] Input Inputs to the EmbeddedICE-RT logic that enable breakpoimts or watchpoints to 


EmbeddedICE-RT 
external imput 


be dependent on external conditions. 





DBGIEBKPT Input 


Instruction breakpoint 


Asserted by external hardware to halt execution of the processor for debug 
purposes. If HIGH at the end of an instruction fetch, 1t causes the ARM926EJ-S 
processor to enter debug state 1f that instruction reaches the Execute stage of the 
processor pipeline. 





DBGINSTREXEC Output 


Instruction executed 


Indicates that the instruction in the Execute stage of the processor pipeline has 
been executed. 








DBGRNG[1:0] Output Indicates that the corresponding EmbeddedICE-RT watchpoint register has 
EmbeddedICE-RT matched the conditions currently present on the address, data, and control buses. 
range out This signal is independent of the state of the watchpoint enable control bit. 
DBGRQI Output Represents the debug request signal that 1s presented to the core debug logic. This 


Internal debug request 


is a combination of EDBGRQ and bit 1 of the debug control register. 





EDBGRQ Input 
External debug request 


An external debugger can force the processor into debug state by asserting this 
signal. 
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A.5  JTAG signals 
Table A-4 describes the ARM926EJ-S processor JTAG signals. 


Table A-4 JTAG signals 














Name Direction Description 

DBGIR[3:0] Output These four bits reflect the current instruction loaded into the TAP controller 

TAP controller instruction register. These bits change when the TAP controller 1s in the 

instruction register UPDATE-IR state. 

DBGnTRST Input This 1s the active LOW reset signal for the EmbeddedICE-RT internal state. This 

Not test reset signal is a level-sensitive asynchronous reset input. 

DBGnTDOEN Output When LOW, indicates that the serial data is being driven out of the DBGTDO 

Not DBGTDO enable output. Normally used as an output enable for a DBGTDO pin in a packaged part. 

DBGSCREG[4:0] Output These five bits reflect the ID number of the scan chain currently selected by the 
TAP controller. These bits change when the TAP controller 1s in the UPDATE-DR 
state. 

DBGSDIN Output Contains the serial data to be applied to an external scan chain. 


External scan chain 
serial input data 





DBGSDOUT Input Contains the serial data out of an external scan chain. When an external scan chain 
External scan chain 1s not connected, this signal must be tied LOW. 


serial data output 

















DBGTAPSM[3:0] Output This bus reflects the current state of the TAP controller state machine. 
TAP controller state 

machine 

DBGTCKEN Input Synchronous test clock enable. 

DBGTDI Input Test data input for debug logic. 

DBGTDO Output Test data output from debug logic. 

DBGTMS Input Test mode select for TAP controller. 
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A.6 Miscellaneous signais 
Table A-5 describes the miscellaneous signals on the ARM926EJ-S processor. 


Table A-5 Miscellaneous signals 


Name Direction Description 


BIGENDINIT Input Determines the setting of the B bitin CP15 cl after a system reset. When HIGH 
the reset state of the B bitis 1 (big-endian). When LOW the reset state of the B 
bitis O (little-endian). 





CLK Input This clock times all operations of the ARM926EJ-S design. All outputs change 
from the rising edge and all inputs are sampled on the rising edge. The clock 
can be stretched 1n either phase. Through the use of the DHCLKEN and 
IHCLKEN signals, this clock also times AHB operations. Through the use of 
the DBGTCKEN signal, this clock also controls JTAG and debug operations. 

















CFGBIGEND Output This signal reflects the setting of the B bit in CP15 cl. When HIGH, the 
ARMOEJ-S core processor treats bytes in memory as being 1n big-endian format. When LOW, 
endianness configuration memory 1s treated as little-endian. 

EXTEST Input EXTEST mode test signal. This signal must be LOW during normal operation. 
INTEST Input INTEST mode test signal. This signal must be LOW during normal operation. 
nFIQ Input This 1s the fast interrupt request signal. This signal must be synchronous to 
Not fast interrupt request CLK. 

nIRQ Input This 1s the interrupt request signal. This signal must be synchronous to CLK. 


Not Interrupt request 











SCANENABLE Input Scan enable test signal. This signal must be LOW during normal operation. 

STANDBYWFI Output When HIGH indicates that the ARM926EJ-S processor 1s in wait for interrupt 
mode. 

TA PID[31:0] Input This 1s the ARM926EJ-S device identification (ID) code test data register, 


accessible from the scan chains. It must be tied to 0x07926FOF for an 
ARM926EJ-S processor when the device 1s instantiated. 








TESTMODE Input Test mode test signal. This signal must be LOW during normal operation. 
VINITHI Input Determines the reset location of the exception vectors. When LOW, the vectors 
Exception vector are located at 0x00000000. When HIGH, the vectors are located at 0xFFFFOQ0O. 


location at reset 
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A.7  ETMinterface signals 
Table A-6 describes the ARM926EJ-S processor ETM interface signals. 
Table A-6 ETM interface signals 
Name Direction Description 
ETMBIGEND Output ETM big-endian configuration Indication. 
ETMCHSD[1:0] Output ETM coprocessor handshake decode signals. 
ETMCHSE[1:0] Output ETM coprocessor handshake execute signals. 
ETMDA[31:0] Output ETM data address. 
ETMDABORT Output ETM data abort. 
ETMDBGACK Output ETM debug mode indication. 
ETMDMAS[1:0] Output ETM data size indication. 
ETMDMORE Output ETM more sequential data indication. 
ETMDnMREQ Output ETM data memory request. 
ETMDnRW Output ETM data not read/write. 
ETMDSEQ Output ETM sequential data indication. 
ETMEN Input Synchronous ETM interface enable. This signal must be tied LOW 1f an ETM 
Is not used. 
ETMHIVECS Output ETM exception vectors configuration. 
ETMIA[31:0] Output ETM instruction address. 
ETMIABORT Output ETM instruction abort. 
ETMID1I5TO11[15:11] Output ETM instruction data field bits [15:11]. 
ETMID31TO25[31:25] Output ETM instruction data field bits [31:25]. 
ETMIJBIT Output ETM Jazelle state indication. 
ETMInMREQ Output ETM imstruction memory request. 
ETMINSTREXEC Output ETM instruction execute indication. 
ETMINSTRVALID Output ETM instruction valid indication. 
ETMISEQ Output ETM sequential instruction access. 
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Table A-6 ETM interface signals (continued) 



































Name Direction Description 

ETMITBIT Output ETM Thumb state indication. 

ETMLATECANCEL Output ETM coprocessor late cancel indication. 

ETMnWAIT Output ETM clock stall signal. 

ETMPASS Output ETM coprocessor instruction execute indication. 

ETMPROCID[31:0] Output ETM process identifier. 

ETMPROCIDWR Output ETMPROCID write strobe. 

ETMRDATA|[31:0] Output ETM read data. 

ETMRNGOUTI[1:0] Output ETM watchpoint register match indication. 

ETMWDATA[31:0] Output ETM write data. 

ETMZIFIRST Output Indicates the current Decode cycle 1s the first being traced for the current Java 
instruction. 

ETMZILAST Output Indicates the current Decode cycle 1s the last being traced for the current Java 
instruction. 

FIFOFULL Input ETM FIFO full. This signal must be tied LOW 1f an ETM 15 not used. 
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A.8  TCMinterface signals 


Table A-7 describes the ARM926EJ-S TCM interface signals. 


Table A-7 TCM interface signals 





Signal Direction Function 

DRADDR[17:0] Output Data TCM address. This 1s the word address for the access. Valid during request 
cycles. 

DRCS Output Chip select. Indicates 1f an access will take place 1n the following cycle. Not valid 


during wait cycles. 





DRDMAADDR[17:0] Input 


Direct memory access address for DTCM memory. If DRDMAEN is set to 1, 
then the value of DRDMAADDR is routed directly through to DRADDR. 





DRDMA EN Input 


DMA access cycle. If asserted, DRADDR is directly sourced from 
DRDMAADDR, and DRC'S is the result of logically ORing DRDMACS with 
the chip select value for the current TCM access. 





DRDMACS Input 


Direct memory access chip-select for DTCM. 





DRIDLE Output 


Data TCM interface 1dle: 

O = TCM access 

1 = no access will take place 1n the current cycle or TCM disabled. 
Not valid for DMA accesses. 





DRnRW Output 


Data TCM read not write: 
O = read 
1l= write. 


Indicates 1f the access 1s a read or write. Valid during request cycles. 





DRRD[31:0] Input 


Data TCM read data. Valid during non-waited data cycles. 





DRSEQ Output 


Request sequential. Valid during request cycles, asserted during wait cycles. 
Indicates that the address in the current cycle 1s sequential to the address used 
during the previous request cycle. 
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Table A-7 TCM interface signals (continued) 





























Signal Direction Function 

DRSIZE[3:0] Input Data TCM size. Static configuration input that specifies the physical size of TCM 
memories attached. 

0000 = absent 

0011 = 4KB 

0100 = 8KB 

1010 = 512KB 

1011 = IMB 

Values 0001, 0010, and 1100 to 1111 are reserved. 

DRWAIT Input Data TCM wait state input. If HIGH, the DTCM cannot service the request 1n that 
cycle. Valid in request cycle and subsequent wait cycles. Ignored 1f not a request 
or wait cycle. 

DRWBL][3:0] Output Data TCM write data byte lane indicator. Valid during request cycles. For reads, 
set to bO000 For writes indicates which byte(s) are to be written, depending on 
the address and the size of the access (word, halfword, or byte). Bits of DRWBL 
are set only when a write 1s taking place, so when DnRW is unset all the bits of 
DRWBL are also unset. 

DRWD/[31:0] Output Data TCM write data. Valid during request cycles when DRnRW is O. Valid 
during waited write cycles. 

INITRAM Input Enables instruction TCM at system reset. Enables booting from the instruction 
TCM if VINITHI is LOW. 

IRADDR[17:0] Output Instruction TCM address. This 1s the word address for the access. Valid during 
request cycles. 

IRCS Output Chip select. Indicates 1f an access will take place 1n the following cycle. Not valid 
during wait cycles. 

IRDMAADR[17:0] Input DMA access cycle. If asserted, IRADDR is directly sourced from 
IRDMAADDR, and IRC'S is the result of logically ORing IRDMACS with the 
chip select value for the current TCM access. 

IRDMA EN Input Enables direct memory access to the ITCM memory using the IRDMA ADDR 
and IRDMACS inputs. 

IRDMACS Input Direct memory access chip-select for TTCM. 


ARM DDI 0198E 


Copyright O 2001-2008 ARM Limited. All rights reserved. A-13 


Signal Descriptions 


Signal 


IRIDLE 


Direction 


Output 


Table A-7 TCM interface signals (continued) 


Function 


Instruction TCM interface i1dle: 
O = TCM access 


1 = no access takes place 1n the current cycle or TCM disabled. Not valid for 
DMA accesses. 





IRnRW 


Output 


Instruction TCM read not write: 
O = read 
1 = write. 


Indicates 1f the access 1s a read or write. Valid during request cycles. 





IRRD[31:0] 


Input 


Instruction TCM read data. Valid during non-waited data cycles. 





IRSEQ 


Output 


Request sequential. Valid during request cycles, asserted during wait cycles. 
Indicates that the address in the current cycle 1s sequential to the address used 
during the previous request cycle. IRSEQ is not valid following ITCM DMA 
accesses. 





IRSIZE[3:0] 


Input 


Instruction TCM size. Static configuration input that specifies the physical size 
of TCM memories attached. 


0000 = absent 
0011 = 4KB 
0100 = 8KB 


1010 = 512KB 
1011 = IMB 
Values 0001, 0010, and 1100 to 1111 are reserved. 





IRWAIT 


Input 


Instruction TCM wait state input. If HIGH, the TTCM cannot service the request 
in that cycle. Valid in request cycle and subsequent wait cycles. Ignored 1f not a 
request or wait cycle. 





IRWBLI[3:0] 


Output 


Instruction TCM write data byte lane indicator. Valid during request cycles. For 
reads, set to DO000 For writes Indicates which byte(s) are to be written, depending 
on the address and the size of the access (word, halfword, or byte). Bits of 
IRWBL are set only when a write 1s taking place, so when IRnRW 1s unset all 
the bits of IRWBL are also unset. 





IRWD[31:0] 


Output 


Instruction TCM write data. Valid during request cycles when IRnRW is O. Valid 
during waited write cycles. 
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Appendix B 
CP15 Test and Debug Registers 


This appendix describes the ARM926EJ-S CP15 Test and Debug Registers. It contains 
the following section: 


o About the Test and Debug Registers on page B-2. 
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B.1 


B.1.1 


B-2 


About the Test and Debug Registers 


The ARM926EJ-S Test and Debug Registers, CP15 cl5, provide additional 
device-specific test operations. You can use the registers to access and control the 
following: 


o Debug Override Register 

o Debug and Test Address Register on page B-4 
o Trace Control Register on page B-5 

o MMU test operations on page B-5 

o Cache Debug Control Register on page B-12 

o MMU Debug Control Register on page B-13 

o Memory Region Remap Register on page B-15. 


You must only use these operations for test. The ARM Architecture Reference Manual 
describes this register as implementation-defined. 


The format of the CP15 test and debug operations 1s: 
MCR/MRC p15, <Opcode 1>, <Rd>, c15, <CRm>, <Opcode 2> 


The MRC and MCR bit pattern 1s shown in Figure B-1. 


2827 26252423 212019 16:19 1211109 8 7 543 





Figure B-1 CP15 MRC and MCR bit pattern 


The L bit distinguishes between an MCR (L = 1) and an MRC (L = 0). 


Debug Override Register 


You can use the Debug Override Register to modify the behavior of the ARM926EJ-S 
core from the default behavior. 


The function of each ARM926EJ-S Debug Override Register bit is shown in Table B-1 
on page B-3. 


The Debug Override Register can be accessed by using the following instructions: 


MRCfcondj p15,0,<Rd>,c15,c0,0 ; Read Debug Override Register 
MCRfcondj p15,0,<Rd>,c15,c0,0 ; Write Debug Override Register 
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The reset state of the Debug Override Register 1s 0x0. 


Table B-1 Debug Override Register 


























Bits Function or name Description 
[31:20] Reserved Read = Unpredictable 
Write = Should Be Zero 
[19] Test and clean all O = Default behavior for test and clean instructions 
1 = Modifies the behavior of test and clean, and test, clean, and 
invalidate instructions so that they act on the complete cache 
[18] Abort data TLB miss O = Do not abort DTLB miss 
1 = Abort DTLB miss 
[17] Abort instruction TLB miss O = Do not abort TTLB miss 
1 = Abort TTLB miss 
[16] Disable NC instruction prefetching | O = Enable prefetching 
1 = Disable prefetching 
[15] Disable block-level clock gating O = Enable block-level clock gating 
1 = Disable block-level clock gating 
[14] Disable NCB stores (force NCNB) | 0O= Enable NCB stores 
1 = Disable NCB stores (force NCNB) 
[13] MMU disabled, DCache enabled O = If MMU disabled. level one access NCNB 
behavior 1 = If MMU disabled and DCache enabled level one access WT 
[12:0] Reserved Read = Unpredictable 
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Write = Should Be Zero 


Bit 13, MMU disabled, DCache enabled behavior 


This bit changes the behavior when the MMU 1s disabled but the DCache 
1s enabled. During normal operation, 1f the MMU is disabled, all data 
accesses are treated as being NCNB. If Bit 13 15 set with the MMU 
disabled, and the DCache is enabled, all data accesses are treated as WT. 





Note 


This behavior can be overridden using the memory region register. 
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B.1.2 


B-4 


Bit 14, disable NCB stores (force NCNB) 


You can use this bit to force all NCB stores to be treated as NCNB stores 
at level one. This bit overrides the settings in both the MMU page tables 
and the memory region remap register. 


Bit 15, disable block-level clock gating 


You can use this bit to disable block-level clock gating with the 
ARM926EJ-S processor. This bit does not affect the functionality of the 
ARM926EJ-S processor. It allows the benefits of block-level clock gating 
to be evaluated without the requirement to build two different 
implementations of the ARM926EJ-S macrocell, one with block-level 
clock gating, one without. 


Bit 16, disable NC instruction prefetching 


You can use this bit to disable speculative prefetching for instructions in 
noncacheable areas of memory. The default behavior of ARM926EJ-S 
processor 1s to perform speculative sequential instruction fetches on the 
AHB interface. Disabling prefetching prevents any speculative 
noncacheable instruction prefetches by the ARM926EJ-S memory 
system, and only instruction requests issued by the ARM9EJ-S core 
result in instruction fetches on the AHB interface. 


Bits 17 & 18, abort instruction TLB miss 


You can use the abort data TLB miss and abort instruction TLB miss bits 
to prevent page table walks occurring as the result of a TLB miss. When 
set, a TLB miss results in the access being aborted as 1f the access has 
resulted 1n a translation fault, and a value of 0000 being written into the 
status field of the appropriate FSR. 


Bit 19, test and clean all 


You can use the test-and-clean-all bit to modify the behavior of the test 
and clean, and test clean and invalidate instructions so that a single 
instruction can be used to clean or clean and invalidate the entire cache. 
This 1s only intended for use by a debugger, to provide an efficient way 
to clean the data cache using scan chain 15. 


Debug and Test Address Register 


This register defines the address used for debug and test operations, and for MMU test 
operations using the MMU Test Register. 


You can access the Debug and Test Address Register using the following instructions: 


MRCfcondJ p15,0,<Rd>,c15,c1,0 ; Read Debug and Test Address Register 
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MCRicondJ p15,0,<Rd>,c15,c1,0 ; Write Debug and Test Address Register 


B.1.3 Trace Control Register 
You can access the Trace Control Register by using the following instructions: 


MCR p15, 1, <Rd>, cl5, cl, O *: Write Trace Control Register 
MRC p15, 1, <Rd>, cl5, cl, O ': Read Trace Control Register 


You can use the Trace Control Register to determine under what conditions the 
ARMDOEJ-S core 1s stalled when the FIFOFULL signal is asserted. 


Usually, non-invasive real-time trace requires the presence of an nFIQ or nIRQ 
interrupt to prevent the ARM9EJ-S core being stalled by FIFOFULL being asserted. 


The Trace Control Register enables you to modify this behavior, so that the presence of 
an interrupt does not prevent the ARM9EJ-S core being stalled 1f FIFOFULL is 
asserted. 


Table B-2 shows the bit assignments for the Trace Control Register. Bits [2:1] of this 
register are reset to O. 


Table B-2 Trace Control Register bit assignments 
Bits Content 


[31:3] Reserved (Should Be Zero) 





2] 1 = FIQ interrupt does not prevent FIFOFULL from stalling the ARM9EJ-S core 
O = FIQ interrupt prevents FIFOFULL from stalling the ARM9EJ-S core 





[1] 1 = IRQ interrupt does not prevent FIFOFULL from stalling the ARM9EJ-S core 
O = IRQ interrupt prevents FIFOFULL from stalling the ARM9EJ-S core 





[0] Reserved (Should Be Zero) 
B.1.4  MMU test operations 


The MMU test operations support accessing TLB structures in the MMU and are used 
in conjunction with the Debug and Test Address Register. 
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You can access the MMU test operations using the instructions in Table B-3. 
Table B-3 MMU test operation instructions 
Instruction Operation 


MRC p15, 4/5, <Rd>, cl5, c2, Q Read tag in main TLB entry 
MCR p15, 4/5, <Rd>, cl5, c3, O Write tag in main TLB entry 





MRC p15, 4/5, <Rd>, cl5, c4, O Read PA and access permission data mn mam TLB entry 
MCR p15, 4/5, <Rd>, cl5, c5, O Write PA and access permission data data in main TLB entry 





MCR p15, 4/5, <Rd>, c15, c7, 0 Transfer mam TLB entry into RAM 





MRC P15, 4/5, <Rd>, cl5, c2, 1 Read tag in lockdown TLB entry 


MCR P15, 4/5, <Rd>, cl5, C3, 1 write tag in lockdown TLB entry 





MRC P15, 4/5, <Rd>, cl5, c4, 1 Read PA and access permission data 1n lockdown TLB entry 
MCR P15, 4/5, <Rd>, cl5, c5, 1 Write PA and access permission data in lockdown TLB entry 





MCR P15, 4/5, <Rd>, cl5, c7, 1 Transfer lockdown TLB entry into RAM 


Inserting or reading entries in the main TLB 
Use this procedure to access entries in the mam TLB: 


1. Use the following Debug and Test Address Register instruction to access a main 
TLB entry: 


MCR p15, Q, <Rd>, cl5, cl, O ; select TLB entry 
The Rd register selects the mam TLB entry as Figure B-2 shows. 


3130 15 14 10 9 0 


E SBZ Indexed entry SBZ 


Figure B-2 Rd format for selecting main TLB entry 
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Table B-4 describes the Rd register entry-select bit fields. 


Table B-4 Encoding of the main TLB entry-select bit fields 





Bit Name Definition 
[31] Way Way select: 
l=way 
O = way 0. 
[30:15]  - Should Be Zero. 





[14:10] Indexed entry | Indexed entry in mam TLB. 





[9:0] 


- Should Be Zero. 


2. Use the following MMU test operation instructions to access the MVA tag: 


MRC p15, 4/5, <Rd>, c15, c2, O 


* read tag 1n main TLB 


MCR p15, 4/5, <Rd>, cl5, c3, O ; write tag in main TLB 


The Rd register contains the read or write data as Figure B-3 shows. 


31 


10 9 43 0 


ps = 


Figure B-3 Rd format for accessing MVA tag of main or lockdown TLB entry 
Table B-5 describes the MVA tag access bit fields in the Rd register. 


Table B-5 Encoding of the TLB MVA tag bit fields 


Bit Name 


[31:10] MVAtag 


Definition 


Modified virtual address. 





[9:5] - 


Should Be Zero. 





[4] V 


Valid bit. 





[3:0] Size of entry 


b0001 = 1KB page or 1KB subpage of 4KB page. 
b0011 = 4KB page 

b0101 = 16KB subpage of 64KB page 

b0111 = 64KB page 

b1011 = IMB section 


Size of entry: 
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3. Use the following MMU Test Register instructions to access the PA and access 
permission data: 


MRC p15, 4/5, <Rd>, cl5, c4, O ; read PA and access permission data 
MCR p15, 4/5, <Rd>, cl5, c5, OQ ; write PA and access permission data 


The Rd register contains the read or write data as shown in Figure B-4. 


31 109 8 7 43210 


Domain AP 
ES o EGO 
Figure B-4 Rd format for accessing PA and AP data of main or lockdown TLB entry 


Table B-6 describes the PA and access permission bit fields in the Rd register. 


Table B-6 Encoding of the TLB entry PA and AP bit fields 











Bit Name Definition 
[31:10] PA Physical address. 
[9:8] - Should Be Zero. 
[7:4] Domain select Domain select: 
b0000 = DO 
b0001 = DI 
b1110 = D14 
b1111=DlIS. 
[3:2] AP Access permission: 


b0O0 = No access. 
b01 = Privileged, read/write. User, no access. 
b10O = Privileged, read/write. User read-only. 


b11 = Privileged, read/write. User, read/write. 








[1] C Cacheable bit. 
[0] B Bufferable bit. 
4. Use the following instruction to complete a write to an entry: 


MCR p15, 4/5, Rd, cl5, c7, O ; transfer main storage into RAM 


To write an entry into the 2-way main TLB, the full sequence 1s therefore: 
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MCR p15, 4/5, <Rd>, cl5, c3, OQ ; write tag main TLB storage reg 
MCR p15, 4/5, <Rd>, cl5, c5, O ; write PA/PROT main TLB storage reg 
MCR p15, 4/5, <Rd>, cl5, c7, O ; transfer main storage into RAM 


To read an entry from the 2-way main TLB, the entry must first be written. The entry 


can then be read using the following instructions: 


MRC p15, 4/5, <Rd>, cl5, c2, O ; read tag main TLB 
MRC p15, 4/5, <Rd>, cl5, c4, O ; read PA/PROT main TLB 


The data RAM attached to the mam MMU 1s 112 bits wide. The mapping into the data 
RAM for main TLB writes for the TAG 1s shown in Table B-7 and would appear on 


MMUxWD[111:0] as shown in Table B-7. 


Table B-7 Main TLB mapping to MMUxWD 









































Way  MMUxWD bits Description 

1 [111:90] TAG[31:10] 
[89:86] Size of entry 
[85:64] PA[31:10] 
[63:60] Domain select [3:0] 
[59:58] AP[1:0] 
[57] Cacheable bit 
[56] Bufferable bit 

0 [55:34] TAG[31:10] 
[33:30] Size of entry 
29:8] PA[31:10] 
[7:4] Domain select [3:0] 
[3:2] AP[1:0] 
[1] Cacheable bit 
[0] Bufferable bit 


During writes, the data 1s replicated so that each way receives the same copy of the data. 
The exact way that 1s written and the exact index of the way 1s specified mm the Test and 


Debug Address Register. 
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Figure B-5 shows what happens during a write to the data RAM attached to the main 
MMU. 


CLK 





MMUxCS 


MMUxADDR 





MMUxWE 


MMUXWD D WDATA. | 
MMUXRD É Y RDATA. 


MMUxOE 





Figure B-5 Write to the data RAM 


Note 

On the rising clock edge when MMUxCS=1, the data on MMUxWD is written into the 
data RAM. The exact index 1s on MMUxADDR, as specified in the Test and Debug 
Address Register. The lanes written are controlled by the MMUxWE[3:0] pins. The 
mapping 1s as follows: 





MMUxWE[0]: 0= read, 1= write MMUxWD[ 29: 0] into RAM MMUxWE[1]: 0= 
read, 1= write MMUxWD[ 55:30] into RAM MMUxWEI2]: 0= read, 1= write 
MMUxWD[ 85:57] into RAM MMUxWE[3]: 0= read, 1= write MMUxWD[111:86] 
into RAMIn the case of the mam MMU, the output enable MMUxOE is driven at all 
times. The MMUXxRD data bus must be strongly driven at all times. The controller 
samples the data from the MMUxRD data bus when a read 1s being performed. 


Inserting or reading entries in the lockdown TLB 
Use this procedure to access entries 1n the lockdown TLB: 


1. Use the following Debug and Test Address Register instruction to access a 
lockdown TLB entry: 
MCR p15, O, <Rd>, c15, cl, Q 


The Rd register selects the lockdown TLB entry as shown in Figure B-6 on 
page B-11. 


Copyright O 2001-2008 ARM Limited. All rights reserved. ARM DDI 0198E 


ARM DDI 0198E 


CP15 Test and Debug Registers 


31 2928 2625 0 


entry 


Figure B-6 Rd format for selecting lockdown TLB entry 


Table B-8 describes the entry-select bit fields in the Rd register. 
Table B-8 Encoding of the lockdown TLB entry-select bit fields 
Bit Name Definition 


[31:29] - Should Be Zero 





[28:26] Indexed entry | Indexed entry in lockdown TLB 





[25:0] - Should Be Zero 


2. Use the following MMU Test Register instructions to access the MVA tag: 
MRC p15, 4, <Rd>, cl5, c2, 1; read lockdown TLB 
MCR p15, 4, <Rd>, cl5, c3, 1; write lockdown TLB 
See Figure B-3 on page B-7 for read or write data in the Rd register. 
3. Use the following MMU Test Register instructions to read or write the PA and 
access permission data: 
MRC p15, 4, <Rd>, cl5, c4, 1; read PA and access permission data 
MCR p15, 4, <Rd>, cl5, c5, 1; write PA and access permission data 
See Figure B-4 on page B-8 for the read or write data in the Rd register. 


4. Use the following instruction to complete a write to an entry: 
MCR p15, 4, <Rd>, c15, c7, 1; transfer lockdown storage into RAM 
To write an entry into the lockdown TLB, the full sequence 1s therefore: 


MCR p15, 4/5, <Rd>, cl5, c3, 1; write tag lockdown TLB storage reg 
MCR p15, 4/5, <Rd>, cl5, c5, 1; write PA/PROT lockdown TLB storage reg 
MCR p15, 4/5, <Rd>, cl5, c7, 1; transfer lockdown storage into RAM 


To read an entry from the lockdown TLB, the entry must first be written. The entry can 
then be read using the following instructions: 


MRC p15, 4/5, <Rd>, cl5, c2, 1; read tag lockdown TLB 
MRC p15, 4/5, <Rd>, cl5, c4, 1; read PA/PROT lockdown TLB 
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The data to be written or read 1s placed in ARM register Rd with the format shown im 
Figure B-4 on page B-8. 
B.1.5 Cache Debug Control Register 


The Cache Debug Control Register 1s used to force specific cache behavior required for 
debug. 


The following instructions can be used to access the Cache Debug Control Register: 


MRCfcondJ p15,7,<Rd>,c15,c0,0 ; read cache debug control register 
MCRfcondJ p15,7,<Rd>,c15,c0,0 ; write cache debug control register 


The Cache Debug Control Register format 1s shown in Figure B-7. 


31 3210 


saz dl 


Figure B-7 Cache Debug Control Register format 


The Cache Debug Control Register bit assignments are listed in Table B-9. The reset 
value of the Cache Debug Control Register 1s 0x0. 


Table B-9 Cache Debug Control Register bit assignments 
Bit Name Function Description 


[31:3] - Reserved Read = Unpredictable 
Write = Should Be Zero 





[2] DWB Disable write-back (force WT) | O= Enable write-back behavior 


1 = Force write-through behavior 





[1] DIL Disable ICache linefill O = Enable ICache linefills 
1 = Disable ICache linefills 





[0] DDL Disable DCache linefill O = Enable DCache linefills 
1 = Disable DCache linefills 
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Forcing write-through behavior 


Setting the DWB bit to 1 forces the DCache to treat all cacheable accesses as though 

they were 1n a write-through region of memory. The setting of the DWB bit overrides 
any setting specified 1n either the MMU page tables or in the Memory Region Remap 
Register. 


If the cache contains dirty cache lines, these remain dirty while the DWB bit 1s set, 
unless they are written back because of a write-back eviction after a linefill, or because 
of an explicit clean operation. 


Lines that are clean are not marked as dirty 1f they are updated while the DWB bit is set. 
This functionality enables a debugger to download code or data to external memory, 
without the requirement to clean part or all of the DCache to ensure that the code or data 
being downloaded has been written to external memory. 





Note 


If the DWB bitis set, and a write is made to a cache line that 1s dirty, then both the cache 
line and external memory are updated with the write data. Other entries 1n the cache line 
still have to be written back to main memory to achieve coherency. 


Disabling cache linefills 


Setting the DDL and DIL bits prevents the relevant cache from updating when 
performing a linefillon a miss. When set, a linefillis performed on a cache miss, reading 
eight words from external memory, but the cache 1s not updated with the linefill data. 
The memory region mapping 1s unchanged. This mode of operation 1s required for 
debug so that the memory image, as seen by the ARM9EJ-S core, can be examined im 
a non-invasive manner. Cache hits from a cacheable region read data words from the 
cache, and cache misses from a cacheable region read words directly from memory. 


B.1.6  MMU Debug Control Register 
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You can use the MMU Debug Control Register to enable TLB and micro TLB entries 
to be preserved during debug. For debug to be non-invasive, bits [5:0] must be set to 
b111111 prior to changing any other CP15 registers, or issumng any system speed load 
or store. If main TLB loading 1s disabled, page table walks still take place, but the 
resultant data 1s forwarded around the TLB. 


It might be necessary to temporarily change the contents of a page table entry to 
facilitate debug operations. Disabling mam TLB matches using bit 6 or 7 enables the 
modified contents of the page table to be used for an access without having to invalidate 
any entries 1n the mam TLB. 
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You can access the MMU Debug Control Register using the following instructions: 


MRCfcondJ p15,7,<Rd>,c15,c1,0 ; read MMU debug control register 
MCRfcondJ p15,7,<Rd>,c15,c1,0 ; write MMU debug control register 


The MMU Debug Control Register format 1s shown in Figure B-8. 


31 


676543210 


DMTMI E 
DMTMD 


DMTLI 
DMTLD 
DIUTM 
DDUTM 
DIUTL 
DDUTL 


Figure B-8 MMU Debug Control Register format 


The MMU Debug Control Register bit assignments are given im Table B-10. The reset 
value of the MMU Debug Control Register 1s 0x0. 


Table B-10 MMU Debug Control Register bit assignments 














Bit Name Function Description 
[31:8] - Reserved Read = Unpredictable 
Write = Should Be Zero 

[7] DMTMI Disable main TLB matching for O = Enable matching 
instruction fetches 1 = Disable matching 

[6] DMTMD Disable mam TLB matching for data | 0O= Enable matching 
accesses 1 = Disable matching 

[5] DMTLI Disable main TLB load because of O = Enable TLB load 
instruction fetch miss 1 = Disable TLB load 

[4] DMTLD Disable main TLB load because of O = Enable TLB load 
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Table B-10 MMU Debug Control Register bit assignments (continued) 


Bit Name Function Description 


[3] DIUTM Disable instruction micro TLB match | 0O= Enable I-micro TLB load 1 = 
Disable I-micro TLB load 





[2] DDUTM Disable data micro TLB match O = Enable D-micro TLB match 
1 = Disable D-micro TLB match 





[1] DIUTL Disable instruction micro TLB load O = Enable D-micro TLB load 1 
= Disable D-micro TLB load 





[0] DDUTL Disable data micro TLB load O = Enable I-micro TLB load 1 = 
Disable I-micro TLB load 


B.1.7 Memory Region Remap Register 


The read/write Memory Region Remap Register overrides the setting specified in the 
MMU page tables, and the default behavior 1f the MMU is disabled. 


The Memory Region Register has four fields for remapping instruction-side memory 
regions and four fields for remapping data-side memory regions. 


You can access the Memory Region Remap Register with the instructions im 
Table B-11. 


Table B-11 Memory Region Remap Register instructions 


Instruction Operation 


MRC p15, O, Rd, c15, c2, O Read Memory Region Remap Register 





MCR p15, O, Rd, c15, c2, 0 Write Memory Region Remap Register 


Figure B-9 shows the bit fields of the Memory Region Remap Register. 


31 161514131211109 8760543210 


=" DNCB as 
DNCNB 


Figure B-9 Memory Region Remap Register format 


INCNB 


ARM DDI 0198E Copyright O 2001-2008 ARM Limited. All rights reserved. B-15 


CP15 Test and Debug Registers 


Table B-12 describes the bit fields of the Memory Region Remap Register. 


Table B-12 Encoding of the Memory Region Remap Register 


























Bit Name Definition Reset state 
[31:16] - Should Be Zero 0x0000 
[15:14] IWB Remap select bits for instruction-side write-back region bll1 

[13:12] IWT Remap select bits for instruction-side write-through region b1O 

[11:10]  INCB Remap select bits for instruction-side noncacheable bufferable region b0l 

[9:8] INCNB Remap select bits for instruction-side noncacheable nonbufferable region | bO0 

[7:6] DWB Remap select bits for data-side write-back region bll1 

[5:4] DWT Remap select bits for data-side write-through region b1O 

[5-2] DNCB Remap select bits for data-side noncacheable bufferable region bO1 

[1:0] DNCNB ' Remap select bits for data-side noncacheable nonbufferable region bOO 


Table B-13 shows the encoding of each of the remap fields. 


Table B-13 Encoding of the remap fields 


Encoding Definition 











bOO noncacheable nonbufferable 
bOl noncacheable bufferable 

b10O write-through 

bll write-back 


Figure B-10 on page B-17 shows the flow and precedence of CP15 c15 control bits im 
resolving the cacheable and bufferable attributes of a memory reference. 
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NCNB NCNB 
Near Force 
NCB nr NCB NCB store 
region be 
CNB (write-through) remapping CNB (write-through) NCNB 
CB (write-back) CB (write-back) 
MDDEB bit Memory Region Remap Register ENCB bit 
Ansa Force NCB store 
DCache enabled to be NCNB 
Debug Override Register 
C and B bit 
E E Page table descriptor 
M | bit 
SS Control Register 


Figure B-10 Memory region attribute resolution 
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Abort 


Abort model 
Access permission 


Addressing modes 
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This glossary describes some of the terms used in this manual. Where terms can have 
several meanings, the meaning presented here 1s intended. 


A mechanism that indicates to a core that 1t must halt execution of an attempted illegal 
memory access. An abort can be caused by the external or internal memory system as a 
result of attempting to access invalid instruction or data memory. An abort 1s classified 
as either a Prefetch or Data Abort, and an internal or External Abort. 


See also Data Abort, External Abort and Prefetch Abort. 


An abort model is the defined behavior of ana ARM processor 1n response to a Data 
Abort exception. Different abort models behave differently with regard to load and store 
instructions that specify base register write-back. 


The mechanism that controls 1f a task or process 1s allowed to access sections or pages 
of memory. If an access 1s attempted to an area of memory without the required 
permissions, a permission fault 1s raised. 


A mechanism, shared by many different instructions, for generating values used by the 
instructions. For four of the ARM addressing modes, the values generated are memory 
addresses (which 1s the traditional role of an addressing mode). A fifth addressing mode 
generates values to be used as operands by data-processing instructions. 
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Advanced High-performance Bus (AHB) 
The AMBA Advanced High-performance Bus system connects embedded processors 
such as an ARM core to high-performance peripherals, DMA controllers, on-chip 
memory, and interfaces. Itis a high-speed, high-bandwidth bus that supports 
multi-master bus management to maximize system performance. 


See also Advanced Microcontroller Bus Architecture and AHB-Lite. 


Advanced Microcontroller Bus Architecture (AMBA) 
AMBA is the ARM open standard for multi-master on-chip buses, capable of running 
with multiple masters and slaves. Itis an on-chip bus specification that describes a 
strategy for the interconnection and management of functional blocks that make up a 
System-on-Chip (SoC). It aids mm the development of embedded processors with one or 
more CPUs or signal processors and multiple peripherals. AMBA complements a 
reusable design methodology by defining a common backbone for SoC modules. AHB 
conforms to this standard. 


Advanced Peripheral Bus (APB) 
The AMBA Advanced Peripheral Bus 1s a simpler bus protocol than AHB. Itis designed 
for use with ancillary or general-purpose peripherals such as timers, interrupt 
controllers, UARTSs, and I/O ports. Connection to the main system bus 1s through a 
system-to-peripheral bus bridge that helps to reduce system power consumption. 


See also Advanced High-performance Bus. 
AHB See Advanced High-performance Bus. 


Aligned Aligned data items are stored so that their address 1s divisible by the highest power of 
two that divides their size. Aligned words and halfwords have addresses that are 
divisible by four and two respectively. The terms word-aligned and halfword-aligned 
therefore stipulate addresses that are divisible by four and two respectively. Other 
related terms are defined similarly. 


AMBA See Advanced Microcontroller Bus Architecture. 
AP See Access permission. 
APB See Advanced Peripheral Bus. 


Application Specific Integrated Circuit (ASIC) 
An integrated circuit that has been designed to perform a specific application function. 
It can be custom-built or mass-produced. 


Application Specific Standard Part/Product (ASSP) 
An integrated circuit that has been designed to perform a specific application function. 
Usually consists of two or more separate circuit functions combined as a building block 
suitable for use 1n a range of products for one or more specific application markets. 
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ARM instruction 


ARM state 


ASIC 
ASSP 
ATPG 
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The organization of hardware and/or software that characterizes a processor and its 
attached components, and enables devices with similar characteristics to be grouped 
together when describing their behavior, for example, Harvard architecture, instruction 
set architecture, ARMv6 architecture. 


Is a word that specifies an operation for aa ARM processor to perform. ARM 
instructions must be word-aligned. 


À processor that 1s executing ARM (32-bit) word-aligned instructions 1s operating in 
ARM state. 


See Application Specific Integrated Circuit. 
See Application Specific Standard Part/Product. 


See Automatic Test Pattern Generation. 


Automatic Test Pattern Generation (ATPG) 


Back-annotation 


Banked registers 


Base register 


The process of automatically generating manufacturing test vectors for an ASIC design, 
using a specialized software tool. 


The process of applying timing characteristics from the implementation process onto a 
model. 


Those physical registers whose use 1s defined by the current processor mode. The 
banked registers are r8 to rl4. 


A register specified by a load or store instruction that 1s used to hold the base value for 
the instructions address calculation. Depending on the instruction and its addressing 
mode, an offset can be added to or subtracted from the base register value to form the 
virtual address which 1s sent to memory. 


Base register write-back 


Beat 


Big-endian 
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Updating the contents of the base register used 1n an instruction target address 
calculation so that the modified address 1s changed to the next higher or lower 
sequential address in memory. This means that it 1s not necessary to fetch the target 
address for successive Instruction transfers and enables faster burst accesses to 
sequential memory. 


Alternative word for an individual transfer within a burst. For example, an INCR4 burst 
comprises four beats. 


See also Burst. 


Byte ordering scheme in which bytes of decreasing significance in a data word are 
stored at increasing addresses in memory. 


See also Little-endian and Endianness. 
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Big-endian memory 


Block address 


Boundary scan chain 


Breakpoint 


Burst 


Bus Interface Unit 


Byte 


Cache 
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Memory in which:- a byte or halfword at a word-aligned address 1s the most significant 
byte or halfword within the word at that address - a byte at a halfword-aligned address 
Is the most significant byte within the halfword at that address. 


See also Little-endian memory. 


An address that comprises a tag, an index, and a word field. The tag bits identify the way 
that contains the matching cache entry for a cache hit. The index bits identify the set 
being addressed. The word field contains the word address that can be used to identify 
specific words, halfwords, or bytes within the cache entry. 


See also Cache terminology diagram on the last page of this glossary. 


A boundary scan chain 1s made up of serially-connected devices that implement 
boundary scan technology using a standard JTAG TAP interface. Each device contains 
at least one TAP controller containing shift registers that form the chain connected 
between TDI and TDO, through which test data 1s shifted. Processors can contam 
several shift registers to enable you to access selected parts of the device. 


A breakpoint is a mechanism provided by debuggers to identify an instruction at which 
program execution 1s to be halted. Breakpoints are inserted by the programmer to enable 
inspection of register contents, memory locations, variable values at fixed points 1n the 
program execution to test that the program 1s operating correctly. Breakpoints are 
removed after the program 1s successfully tested. 


See also Watchpoint. 


A group of transfers to consecutive addresses. Because the addresses are consecutive, 
there 1s no requirement to supply an address for any of the transfers after the first one. 
This increases the speed at which the group of transfers can occur. Bursts over AHB 
buses are controlled using the HBURST signals to specify 1f transfers are single, 
four-beat, erght-beat, or 16-beat bursts, and to specify how the addresses are 
incremented. 


See also Beat. 


The Bus Interface Unit (BIU) controls all data accesses across the AHB. It arbitrates and 
schedules AHB requests. 


An 8-bit data item. 


A block of on-chip or off-chip fast access memory locations, situated between the 
processor and main memory, used for storing and retrieving copies of often used 
instructions and/or data. This 1s done to greatly reduce the average speed of memory 
accesses and so to increase processor performance. 


See also Cache terminology diagram on the last page of this glossary. 
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Cache contention 


Cache hit 


Cache line 


Cache line index 


Cache lockdown 


Cache miss 


Cache set 


Cache way 


CAM 
Cast out 


Clean 


Clock gating 
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When the number of frequently-used memory cache lines that use a particular cache set 
exceeds the set-associativity of the cache. In this case, main memory activity increases 
and performance decreases. 


À memory access that can be processed at high speed because the instruction or data 
that 1t addresses 1s already held 1n the cache. 


The basic unit of storage in a cache. It is always a power of two words in size (usually 
four or 8 words), and 1s required to be aligned to a suitable memory boundary. 


See also Cache terminology diagram on the last page of this glossary. 


The number associated with each cache line 1n a cache way. Within each cache way, the 
cache lines are numbered from O to (set associativity) -1. 


See also Cache terminology diagram on the last page of this glossary. 


To fix a line in cache memory so that 1t cannot be overwritten. Cache lockdown enables 
critical instructions and/or data to be loaded into the cache so that the cache lines 
containing them are not subsequently reallocated. This ensures that all subsequent 
accesses to the instructions/data concerned are cache hits, and therefore complete as 
quickly as possible. 


À memory access that cannot be processed at high speed because the instruction/data 1t 
addresses 1s not 1n the cache and a main memory access 1s required. 


A cache set 1s a group of cache lines (or blocks). A set contains all the ways that can be 
addressed with the same index. The number of cache sets 1s always a power of two. 


See also Cache terminology diagram on the last page of this glossary. 

A group of cache lines (or blocks). Itis 2 to the power of the number of index bits 1n size. 
See also Cache terminology diagram on the last page of this glossary. 

See Content Addressable Memory. 

See Victim. 


A cache line that has not been modified while 1t 1s 1n the cache 1s said to be clean. To 
clean a cache 1s to write dirty cache entries into main memory. If a cache line is clean, 
tis not written on a cache miss because the next level of memory contains the same 
data as the cache. 


See also Dirty. 


Gating a clock signal for a macrocell with a control signal and using the modified clock 
that results to control the operating state of the macrocell. 
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Clocks Per 
Instruction 


Coherency 


Cold reset 


See Cycles Per Instuction. 


See Memory coherency. 


Also known as power-on reset. Starting the processor by turning power on. Turning 
power off and then back on again clears main memory and many internal settings. Some 
program failures can lock up the processor and require a cold reset to enable the system 
to be used again. In other cases, only a warm reset 1s required. 


See also Warm reset. 


Communications channel 


The hardware used for communicating between the software running on the processor, 
and an external host, using the debug interface. When this communication 1s for debug 
purposes, 1t 1s called the Debug Comms Channel. In an ARMv6 compliant core, the 
communications channel includes the Data Transfer Register, some bits of the Data 
Status and Control Register, and the external debug interface controller, such as the 
DBGTAP controller 1n the case of the JTAG interface. 


Condensed Reference Format (CRF) 


Condition field 


Conditional execution 


An ARM proprietary file format for specifying test vectors. 


A 4-bit field in an instruction that 1s used to specify a condition under which the 
instruction can execute. 


If the condition code flags indicate that the corresponding condition 1s true when the 
instruction starts executing, it executes normally. Otherwise, the instruction does 
nothing. 


Content Addressable Memory (CAM) 


Context 
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Memory that 1s identified by its contents. Content Addressable Memory 1s used im 
CAM-RAM architecture caches to store the tags for cache entries. 


CAM includes comparison logic with each bit of storage. A data value 1s broadcast to 
all words of storage and compared with the values there. Words that match are flagged 
in some way. Subsequent operations can then work on flagged words. It is possible to 
read the flagged words out one at a time or write to certain bit positions 1n all of them. 


The environment that each process operates in for a multitasking operating system. In 
ARM processors, this 1s limited to mean the Physical Address range that 1t can access 
in memory and the associated memory access permissions. 


See also Fast context switch. 
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Control bits 


Coprocessor 


Copy back 


Core 


Core module 


Core reset 
CPI 

CPSR 
CRF 


Glossary 


The bottom eight bits of a Program Status Register (PSR). The control bits change when 
an exception arises and can be altered by software only when the processor 1s 1n a 
privileged mode. 


A processor that supplements the main processor. It carries out additional functions that 
the main processor cannot perform. Usually used for floating-point math calculations, 
signal processing, or memory management. 


See Write-back. 


A core 1s that part of a processor that contains the ALU, the datapath, the 
general-purpose registers, the Program Counter, and the instruction decode and control 
circuitry. 


In the context of an ARM Integrator, a core module 1s an add-on development board that 
contains an ARM processor and local memory. Core modules can run standalone, or can 
be stacked onto Integrator motherboards. 


See Warm reset. 
See Cycles per instruction. 
See Current Program Status Register 


See Condensed Reference Format. 


Current Program Status Register (CPSR) 


The register that holds the current operating processor status. 


Cycles Per instruction (CPI) 


Data Abort 


Data cache 


DBGTAP 
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Cycles per instruction (or clocks per instruction) 1s a measure of the number of 
computer instructions that can be performed in one clock cycle. This figure of merit can 
be used to compare the performance of different CPUs against each other. The lower the 
value, the better the performance. 


An indication from a memory system to a core that 1t must halt execution of an 
attempted illegal memory access. A Data Abort is attempting to access invalid data 
memory. 


See also Abort, External Abort, and Prefetch Abort. 


A block of on-chip fast access memory locations, situated between the processor and 
main memory, used for storing and retrieving copies of often used data. This 1s done to 
greatly reduce the average speed of memory accesses and so to Increase processor 
performance. 


See Debug Test Access Port. 
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DCache 


Debugger 


A block of on-chip fast access memory locations, situated between the processor and 
main memory, used for storing and retrieving copies of often used data. This 1s done to 
greatly reduce the average speed of memory accesses and so to increase processor 
performance. 


A debugging system that includes a program, used to detect, locate, and correct software 
faults, together with custom hardware that supports software debugging. 


Debug Test Access Port (DBGTAP) 


Direct-mapped cache 


The collection of four mandatory and one optional terminals that form the input/output 
and control interface to a JTAG boundary-scan architecture. The mandatory terminals 
are DBGTDI, DBGTDO, DBGTMS, and TCK. The optional terminal is TRST 
(DBGnTRS'T). This signal is mandatory mn ARM cores because 1t 1s used to reset the 
debug logic. 


À one-way set-associative cache. Each cache set consists of a single cache line, so cache 
look-up selects and checks a single cache line. 


Direct Memory Access (DMA) 


Dirty 


DMA 
DNM 


Domain 


Do Not Modify (DNM) 


Doubleword 
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An operation that accesses main memory directly, without the processor performing any 
accesses to the data concerned. 


A cache line 1n a write-back cache that has been modified while 1t 1s 1n the cache 1s said 
to be dirty. A cache line 1s marked as dirty by setting the dirty bit. If a cache line 1s dirty, 
1t must be written to memory on a cache miss because the next level of memory contains 
data that has not been updated. The process of writing dirty data to main memory is 
called cache cleaning. 


See also Clean. 
See Direct Memory Access. 
See Do Not Modify. 


A collection of sections, large pages and small pages of memory, which can have their 
access permissions switched rapidly by writing to the Domain Access Control Register 
(CP15 register c3). 


In Do Not Modify fields, the value must not be altered by software. DNM fields read as 
Unpredictable values, and must only be written with the same value read from the same 
field on the same processor. Throughout this manual, DNM fields are sometimes 
followed by RAZ or RÃO 1n parentheses to show which way the bits should read for 
future compatibility, but programmers must not rely on this behavior. 


A 64-bit data item. The contents are taken as being an unsigned integer unless otherwise 
stated. 
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EmbeddediCE logic 


EmbeddedICE-RT 


Glossary 


A data item having a memory address that 1s divisible by 8. 


An on-chip logic block that provides TAP-based debug support for ARM processor 
cores. Itis accessed through the TAP controller on the ARM core using the JTAG 
interface. 


The JTAG-based hardware provided by debuggable ARM processors to aid debugging 
In real-time. 


Embedded Trace Buffer 


The ETB provides on-chip storage of trace data using a configurable sized RAM. 


Embedded Trace Macrocell (ETM) 


Endianness 


ETM 


Event 


Exception 


A hardware macrocell which, when connected to a processor core, outputs instruction 
and data trace information on a trace port. The ETM provides processor driven trace 
through a trace port compliant to the ATB protocol. 


Byte ordering. The scheme that determines the order in which successive bytes of a data 
word are stored in memory. An aspect of the system's memory mapping. 


See also Little-endian and Big-endian 
See Embedded Trace Macrocell. 


1 (Simple) An observable condition that can be used by an ETM to control aspects of a 
trace. 


2 (Complex) A boolean combination of simple events that 1s used by an ETM to control 
aspects of a trace. 


A fault or error event that 1s considered serious enough to require that program 
execution 1s interrupted. Examples include attempting to perform an invalid memory 
access, external interrupts, and undefined instructions. When an exception occurs, 
normal program flow 1s interrupted and execution 1s resumed at the corresponding 
exception vector. This contains the first instruction of the interrupt handler to deal with 
the exception. 


Exception service routine 


Exception vector 


External Abort 
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See Interrupt handler. 
See Interrupt vector. 


An indication from an external memory system to a core that 1t must halt execution of 
an attempted illegal memory access. An External Abort 1s caused by the external 
memory system as a result of attempting to access invalid memory. 


See also Abort, Data Abort and Prefetch Abort. 
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Fast context switch 


In a multitasking system, the point at which the time-slice allocated to one process stops 
and the one for the next process starts. If processes are switched often enough, they can 
appear to a user to be running in parallel, 1n addition to being able to respond quicker to 
external events that might affect them. 


In ARM processors, a fast context switch 1s caused by the selection of a non-zero PID 
value to switch the context to that of the next process. A fast context switch causes each 
Virtual Address for a memory access, generated by the ARM processor, to produce a 
Modified Virtual Address which 1s sent to the rest of the memory system to be used in 
place of a normal Virtual Address. For some cache control operations Virtual Addresses 
are passed to the memory system as data. In these cases no address modification takes 
place. 


See also Fast Context Switch Extension. 


Fast Context Switch Extension (FCSE) 


FCSE 


Flat address mapping 


Fully-associative cache 


An extension to the ARM architecture that enables cached processors with an MMU to 
present different addresses to the rest of the memory system for different software 
processes, even when those processes are using identical addresses. 


See also Fast context switch. 


See Fast Context Switch Extension. 


A system of organizing memory im which each Physical Address contained within the 
memory space 1s the same as 1ts corresponding Virtual Address. 


A cache that has just one cache set that consists of the entire cache. The number of cache 
entries 1s the same as the number of cache ways. 


See also Direct-mapped cache. 


Half-rate clocking (ETM) 


Halfword 


Halt mode 


Glossary-10 


Dividing the trace clock by two so that the TPA can sample trace data signals on both 
the rising and falling edges of the trace clock. The primary purpose of half-rate clocking 
1s to reduce the signal transition rate on the trace clock of an ASIC for very high-speed 
systems. 


A 16-bit data item. 


One of two mutually exclusive debug modes. In halt mode all processor execution halts 
when a breakpoint or watchpoint 1s encountered. All processor state, coprocessor state, 
memory and input/output locations can be examined and altered by the JTAG interface. 


See also Monitor debug-mode. 


Copyright O 2001-2008 ARM Limited. All rights reserved. ARM DDI 0198E 


Glossary 


High vectors Alternative locations for exception vectors. The high vector address range 1s near the 
top of the address space, rather than at the bottom. 


Host A computer that provides data and other services to another computer. Especially, a 
computer providing debugging services to a target being debugged. 


ICache A block of on-chip fast access memory locations, situated between the processor and 
main memory, used for storing and retrieving copies of often used instructions. This 1s 
done to greatly reduce the average speed of memory accesses and so to Increase 
processor performance. 


IGN See Ignore. 

Ignore (IGN) Must ignore memory writes. 

Ilegal instruction An instruction that 1s architecturally Undefined. 
IMB See Instruction Memory Barrier. 


Implementation-defined 
Means that the behavior 1s not architecturally defined, but should be defined and 
documented by individual implementations. 


Implementation-specific 
Means that the behavior 1s not architecturally defined, and does not have to be 
documented by individual implementations. Used when there are a number of 
implementation options available and the option chosen does not affect software 


compatibility. 
Index See Cache index. 
Index register A register specified in some load or store instructions. The value of this register 1s used 


as an offset to be added to or subtracted from the base register value to form the virtual 
address, which 1s sent to memory. Some addressing modes optionally enable the index 
register value to be shifted prior to the addition or subtraction. 


Instruction cache A block of on-chip fast access memory locations, situated between the processor and 
main memory, used for storing and retrieving copies of often used instructions. This 1s 
done to greatly reduce the average speed of memory accesses and so to increase 
processor performance. 


Instruction cycle count 
The number of cycles for which an instruction occupies the Execute stage of the 
pipeline. 


Instruction Memory Barrier (IMB) 
An operation to ensure that the prefetch buffer 1s flushed of all out-of-date instructions. 
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Internal scan chain À series of registers connected together to form a path through a device, used during 
production testing to import test patterns into internal nodes of the device and export the 
resulting values. 


Interrupt handler A program that control of the processor 1s passed to when an Interrupt occurs. 


Interrupt vector One of a number of fixed addresses 1n low memory, or in high memory 1f high vectors 
are configured, that contains the first instruction of the corresponding interrupt handler. 


Invalidate To mark a cache line as being not valid by clearing the valid bit. This must be done 
whenever the line does not contain a valid cache entry. For example, after a cache flush 
all lines are invalid. 


Joint Test Action Group (JTAG) 
The name of the organization that developed standard IEEE 1149.1. This standard 
defines a boundary-scan architecture used for in-circuit testing of integrated circuit 
devices. Itis commonly known by the initials JTAG. 


JTAG See Joint Test Action Group. 
Line See Cache line. 
Little-endian Byte ordering scheme in which bytes of increasing significance in a data word are stored 


at increasing addresses in memory. 
See also Big-endian and Endianness. 


Little-endian memory 
Memory in which: - a byte or halfword at a word-aligned address 1s the least significant 
byte or halfword within the word at that address - a byte at a halfword-aligned address 
Is the least significant byte within the halfword at that address. 


See also Big-endian memory. 


Load/store architecture 
À processor architecture where data-processing operations only operate on register 
contents, not directly on memory contents. 


Load Store Unit (LSU) 
The part of a processor that handles load and store transfers. 


LSU See Load Store Unit. 


Macrocell A complex logic block with a defined interface and behavior. A typical VLSI system 
comprises several macrocells (such as a processor, an ETM, and a memory block) plus 
application-specific logic. 
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Memory bank One of two or more parallel divisions of interleaved memory, usually one word wide, 
that enable reads and writes of multiple words at a time, rather than single words. All 
memory banks are addressed simultaneously and a bank enable or chip select signal 
determines which of the banks 1s accessed for each transfer. Accesses to sequential 
word addresses cause accesses to sequential banks. This enables the delays associated 
with accessing a bank to occur during the access to its adjacent bank, speeding up 
memory transfers. 


Memory coherency A memory 1s coherent 1f the value read by a data read or instruction fetch 1s the value 
that was most recently written to that location. Memory coherency 1s made difficult 
when there are multiple possible physical locations that are involved, such as a system 
that has main memory, a write buffer and a cache. 


Memory Management Unit (MMU) 
Hardware that controls caches and access permissions to blocks of memory, and 
translates virtual addresses to physical addresses. 


Memory Protection Unit (MPU) 
Hardware that controls access permissions to blocks of memory. Unlike an MMU, an 
MPU does not translate virtual addresses to physical addresses. 


Microprocessor See Processor. 
Miss See Cache miss. 
MMU See Memory Management Unit. 


Modified Virtual Address (MVA) 
A Virtual Address produced by the ARM processor can be changed by the current 
Process ID to provide a Modified Virtual Address (MVA) for the MMUSs and caches. 


See also Fast Context Switch Extension. 


Monitor debug-mode 
One of two mutually exclusive debug modes. In Monitor debug-mode the processor 
enables a software abort handler provided by the debug monitor or operating system 
debug task. When a breakpoint or watchpoint is encountered, this enables vital system 
interrupts to continue to be serviced while normal program execution 1s suspended. 


See also Halt mode. 


MPU See Memory Protection Unit. 

Multi-ICE A JTAG-based tool for debugging embedded systems. 
MVA See Modified Virtual Address. 

NCB See Noncacheable Buffered. 
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NCNB 


Noncacheable 
Buffered 


Noncacheable 
Nonbufferable 


PA 


Penalty 


Power-on reset 


Prefetching 


Prefetch Abort 


Processor 


Physical Address (PA) 


Read 


RealView ICE 
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See Noncacheable Nonbufferable. 


Is a memory region where reads are performed from main memory and are not allocated 
to the cache. Writes are performed to main memory through a write buffer, so processor 
core execution can continue while the write 1s completed to main memory. 


Is a memory region where reads are performed from main memory and are not allocated 
to the cache. Writes are performed to main memory without buffering, so processor core 
execution 1s halted while the write 1s completed. 


See Physical Address. 


The number of cycles in which no useful Execute stage pipeline activity can occur 
because an instruction flow 1s different from that assumed or predicted. 


See Cold reset. 


In pipelined processors, the process of fetching instructions from memory to fill up the 
pipeline before the preceding instructions have finished executing. Prefetching an 
instruction does not mean that the instruction has to be executed. 


An indication from a memory system to a core that 1t must halt execution of an 
attempted illegal memory access. A Prefetch Abort can be caused by the external or 
internal memory system as a result of attempting to access invalid instruction memory. 


See also Data Abort, External Abort and Abort. 


À processor 1s the circuitry in a computer system required to process data using the 
computer instructions. It is an abbreviation of microprocessor. A clock source, power 
supplies, and main memory are also required to create a minimum complete working 
computer system. 


The MMU performs a translation on Modified Virtual Addresses (MVA) to produce the 
Physical Address (PA) which is given to AHB to perform an external access. The PA 15 
also stored 1n the data cache to avoid the necessity for address translation when data 1s 
cast out of the cache. 


See also Fast Context Switch Extension. 


Reads are defined as memory operations that have the semantics of a load. That 1s, the 
ARM instructions LDM, LDRD, LDC, LDR, LDRT, LDRSH, LDRH, LDRSB, LDRB, 
LDRBT, LDREX, RFE, STREX, SWP, and SWPB, and the Thumb instructions LDM, 
LDR, LDRSH, LDRH, LDRSB, LDRB, and POP. Java instructions that are accelerated 
by hardware can cause a number of reads to occur, according to the state of the Java 
stack and the implementation of the Java hardware acceleration. 


A system for debugging embedded processor cores using a JTAG interface. 
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Region 


Remapping 


Reserved 
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A partition of instruction or data memory space. 


Changing the address of physical memory or devices after the application has started 
executing. This 1s typically done to allow RAM to replace ROM when the initialization 
has been completed. 


A field 1n a control register or instruction format is reserved 1f the field 1s to be defined 
by the implementation, or produces Unpredictable results 1f the contents of the field are 
not zero. These fields are reserved for use in future extensions of the architecture or are 
implementation-specific. All reserved bits not used by the implementation must be 
written as O and read as 0. 


Saved Program Status Register (SPSR) 


SBO 
SBZ 
SBZP 


Scan chain 


SCREG 
Set 


Set-associative cache 


Short vector operation 


Should Be One (SBO) 
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The register that holds the CPSR of the task immediately before the exception occurred 
that caused the switch to the current mode. 


See Should Be One. 
See Should Be Zero. 
See Should Be Zero or Preserved. 


A scan chain 1s made up of serially-connected devices that implement boundary scan 
technology using a standard JTAG TAP interface. Each device contains at least one TAP 
controller containing shift registers that form the chain connected between TDI and 
TDO, through which test data 1s shifted. Processors can contain several shift registers 
to enable you to access selected parts of the device. 


The currently selected scan chamm number in an ARM TAP controller. 


See Cache set. 


In a set-associative cache, lines can only be placed 1n the cache 1n locations that 
correspond to the modulo division of the memory address by the number of sets. If there 
are n ways 1n a cache, the cache 1s termed n-way set-associative. The set-associativity 
can be any number greater than or equal to 1 and 1s not restricted to being a power of 
two. 


An operation involving more than one destination register and perhaps more than one 
source register 1n the generation of the result for each destination. 


Should be written as 1 (or all Is for bit fields) by software. Writing a O produces 
Unpredictable results. 
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Should Be Zero (SBZ) 


Should be written as O (or all Os for bit fields) by software. Writing a 1 produces 
Unpredictable results. 


Should Be Zero or Preserved (SBZP) 


SPICE 


SPSR 


Tag 


TAP 
TCM 


Test Access Port (TAP) 


Thumb instruction 


Thumb state 


Should be written as O (or all Os for bit fields) by software, or preserved by writing the 
same value back that has been previously read from the same field on the same 
processor. 


Simulation Program with Integrated Circuit Emphasis. An accurate transistor-level 
electronic circuit simulation tool that can be used to predict how an equivalent real 
circuit will behave for given circuit conditions. 


See Saved Program Status Register 


The upper portion of a block address used to identify a cache line within a cache. The 
block address from the CPU 1s compared with each tag 1n a set in parallel to determine 
1f the corresponding line 1s in the cache. Ifit is, 1t is said to be a cache hit and the line 
can be fetched from cache. If the block address does not correspond to any of the tags, 
It is said to be a cache miss and the line must be fetched from the next level of memory. 


See also Cache terminology diagram on the last page of this glossary. 
See Test access port. 


See Tightly coupled memory. 


The collection of four mandatory and one optional terminals that form the imput/output 
and control interface to a JTAG boundary-scan architecture. The mandatory terminals 
are TDI, TDO, TMS, and TCK. The optional terminal is TRST. This signal is 
mandatory mn ARM cores because 1t 1s used to reset the debug logic. 


A halfword that specifies an operation for an ARM processor in Thumb state to 
perform. Thumb instructions must be halfword-aligned. 


A processor that 1s executing Thumb (16-bit) halfword aligned instructions 1s operating 
im Thumb state. 


Tightly coupled memory (TCM) 


Glossary-16 


An area of low latency memory that provides predictable instruction execution or data 
load timing 1n cases where deterministic performance 1s required. TCMs are suited to 
holding: 

o critical routines such as for interrupt handling 

o scratchpad data 

o data types whose locality 1s not suited to caching 

o critical data structures, such as interrupt stacks. 
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See Translation Look-aside Buffer. 


Translation Lookaside Buffer (TLB) 


Translation table 


Translation table walk 


Undefined 


Unpredictable 


Unpredictable 


VA 


Victim 


Virtual Address (VA) 


Warm reset 
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A cache of recently used page table entries that avoid the overhead of page table 
walking on every memory access. Part of the Memory Management Unit. 


A table, held in memory, that contains data that defines the properties of memory areas 
of various fixed sizes. 


The process of doing a full translation table lookup. It is performed automatically by 
hardware. 


Indicates an instruction that generates an Undefined instruction trap. See the ARM 
Architecture Reference Manual for more details on ARM exceptions. 


Means that the behavior of the ETM cannot be relied upon. Such conditions have not 
been validated. When applied to the programming of an event resource, only the output 
of that event resource 1s Unpredictable. Unpredictable behavior can affect the behavior 
of the entire system, because the ETM 1s capable of causing the core to enter debug 
state, and external outputs may be used for other purposes. 


For reads, the data returned when reading from this location 1s unpredictable. It can have 
any value. For writes, writing to this location causes unpredictable behavior, or an 
unpredictable change 1n device configuration. Unpredictable instructions must not halt 
or hang the processor, or any part of the system. 


See Virtual Address. 


A cache line, selected to be discarded to make room for a replacement cache line that 1s 
required as a result of a cache miss. The way in which the victim 1s selected for eviction 
Is processor-specific. A victim 1s also known as a cast out. 


The MMU uses its page tables to translate a Virtual Address into a Physical Address. 
The processor executes code at the Virtual Address, which might be located elsewhere 
in physical memory. 


See also Fast Context Switch Extension, Modified Virtual Address, and Physical 
Address. 


Also known as a core reset. Initializes the majority of the processor excluding the debug 
controller and debug logic. This type of reset is useful 1f you are using the debugging 
features of a processor. 
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Watchpoint 


Way 
WB 
Word 


Write 


Write-back (WB) 


Write buffer 


Write completion 


Write-through (WT) 


WwT 
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A watchpoint is a mechanism provided by debuggers to halt program execution when 
the data contained by a particular memory address 1s changed. Watchponts are inserted 
by the programmer to allow inspection of register contents, memory locations, and 
variable values when memory 1s written to test that the program 1s operating correctly. 
Watchpoints are removed after the program 1s successfully tested. See also Breakpont. 


See Cache way. 
See Write-back. 
A 32-bit data item. 


Writes are defined as operations that have the semantics of a store. That 1s, the ARM 
instructions SRS, STM, STRD, STC, STRT, STRH, STRB, STRBT, STREX, SWP, and 
SWPB, and the Thumb instructions STM, STR, STRH, STRB, and PUSH. Java 
instructions that are accelerated by hardware can cause a number of writes to occur, 
according to the state of the Java stack and the implementation of the Java hardware 
acceleration. 


In a write-back cache, data 1s only written to main memory when 1t 1s forced out of the 
cache on line replacement following a cache miss. Otherwise, writes by the processor 
only update the cache. (Also known as copyback). 


A block of high-speed memory, arranged as a FIFO buffer, between the data cache and 
main memory, whose purpose 1s to optimize stores to main memory. 


The memory system indicates to the processor that a write has been completed at a point 
1n the transaction where the memory system 1s able to guarantee that the effect of the 
write 18 visible to all processors 1n the system. This 1s not the case 1f the write 15 
associated with a memory synchronization primitive, or 1s to a Device or Strongly 
Ordered region. In these cases the memory system might only indicate completion of 
the write when the access has affected the state of the target, unless 1t 1s impossible to 
distinguish between having the effect of the write visible and having the state of target 
updated. 


This stricter requirement for some types of memory ensures that any side-effects of the 
memory access can be guaranteed by the processor to have taken place. You can use this 
to prevent the starting of a subsequent operation in the program order until the 
side-effects are visible. 


In a write-through cache, data 1s written to main memory at the same time as the cache 
1s updated. 


See Write-through. 
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Cache terminology diagram 
The diagram below illustrates the following cache terminology: 


o block address 


o cache line 
o cache set 
o cache way 
o index 
" tag. 
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